Class: Scrapetor::SAX::Tokenizer
- Inherits:
-
Object
- Object
- Scrapetor::SAX::Tokenizer
- Defined in:
- lib/scrapetor/sax.rb
Overview
Standalone tokenizer — yields events without going through a handler. Useful when you just want an enumerator:
Scrapetor::SAX::Tokenizer.new(html).each_event do |type, *args|
# ...
end
Constant Summary collapse
- VOID =
%w[ area base br col embed hr img input link meta source track wbr ].freeze
- RAW_TEXT =
%w[script style].freeze
Instance Method Summary collapse
- #each_event(&block) ⇒ Object
-
#initialize(html) ⇒ Tokenizer
constructor
A new instance of Tokenizer.
Constructor Details
Instance Method Details
#each_event(&block) ⇒ Object
86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/scrapetor/sax.rb', line 86 def each_event(&block) return enum_for(:each_event) unless block_given? block.call([:doc_start]) while @pos < @len ch = byte(@pos) if ch == 0x3C # '<' handle_open(&block) else handle_text(&block) end end block.call([:doc_end]) self end |