Class: Markbridge::Parsers::HTML::Parser
- Inherits:
-
Object
- Object
- Markbridge::Parsers::HTML::Parser
- Defined in:
- lib/markbridge/parsers/html/parser.rb
Overview
Parses HTML into an AST using Nokogiri
Constant Summary collapse
- IGNORED_TAGS =
Tags whose contents should be dropped entirely (not emitted as text). These are raw-text/metadata elements whose children are either CSS, JavaScript, or document metadata that shouldn’t appear in output.
%w[style script head title noscript template].freeze
Instance Attribute Summary collapse
-
#unknown_tags ⇒ Object
readonly
Returns the value of attribute unknown_tags.
Instance Method Summary collapse
-
#initialize(handlers: nil) {|HandlerRegistry| ... } ⇒ Parser
constructor
Create a new parser with optional custom handlers.
-
#parse(input) ⇒ AST::Document
Parse HTML string into an AST.
-
#process_children(node, parent) ⇒ Object
Process child nodes of an element (used by handlers).
Constructor Details
#initialize(handlers: nil) {|HandlerRegistry| ... } ⇒ Parser
Create a new parser with optional custom handlers
18 19 20 21 22 23 24 25 26 |
# File 'lib/markbridge/parsers/html/parser.rb', line 18 def initialize(handlers: nil, &block) @handlers = if block_given? HandlerRegistry.build_from_default(&block) else handlers || HandlerRegistry.default end @unknown_tags = Hash.new(0) end |
Instance Attribute Details
#unknown_tags ⇒ Object (readonly)
Returns the value of attribute unknown_tags.
13 14 15 |
# File 'lib/markbridge/parsers/html/parser.rb', line 13 def @unknown_tags end |
Instance Method Details
#parse(input) ⇒ AST::Document
Parse HTML string into an AST
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/markbridge/parsers/html/parser.rb', line 31 def parse(input) @unknown_tags.clear # Parse HTML with Nokogiri. Using the generic HTML (HTML4) parser rather # than HTML5 because Nokogiri::HTML5 is not available on JRuby # (see sparklemotion/nokogiri#2227). Table support treats thead/tbody/tfoot # as transparent, so the parse-tree difference (HTML5 auto-inserts tbody, # HTML4 does not) has no effect on the AST. doc = Nokogiri::HTML.fragment(input) # Create root AST document document = AST::Document.new # Process all nodes doc.children.each { |node| process_node(node, document) } document end |
#process_children(node, parent) ⇒ Object
Process child nodes of an element (used by handlers)
53 54 55 |
# File 'lib/markbridge/parsers/html/parser.rb', line 53 def process_children(node, parent) node.children.each { |child| process_node(child, parent) } end |