Module: MultiXML::Parsers::LibxmlSax Private
- Extended by:
- MultiXML::Parser
- Defined in:
- lib/multi_xml/parsers/libxml_sax.rb
Overview
This module is part of a private API. You should avoid using this module if possible, as it may be removed or be changed in the future.
SAX-based parser using LibXML (faster for large documents)
Defined Under Namespace
Classes: Handler
Constant Summary collapse
- ParseError =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Exception class raised on LibXML parse failure
::LibXML::XML::Error
Class Method Summary collapse
-
.attribute_names(tag) ⇒ Array<String>
private
Extract non-xmlns attribute names from a start tag.
-
.dom_fallback?(source, namespaces) ⇒ Boolean
private
Determine whether libxml_sax must fall back to the DOM parser.
-
.parse(xml, namespaces: :strip) ⇒ Hash
private
Parse XML from a string or IO object.
-
.parse_with_dom(source, namespaces) ⇒ Hash
private
Parse via the DOM libxml backend.
-
.parse_with_sax(source, namespaces) ⇒ Hash
private
Parse via libxml-ruby's SAX parser.
-
.stripped_attribute_collision?(source) ⇒ Boolean
private
Detect whether a start tag has attributes that collide after stripping.
Methods included from MultiXML::Parser
Class Method Details
.attribute_names(tag) ⇒ Array<String>
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Extract non-xmlns attribute names from a start tag
53 54 55 56 57 |
# File 'lib/multi_xml/parsers/libxml_sax.rb', line 53 def attribute_names(tag) tag.scan(/\s([a-zA-Z_][\w.-]*(?::[a-zA-Z_][\w.-]*)?)\s*=/).flatten.reject do |name| name == "xmlns" || name.start_with?("xmlns:") end end |
.dom_fallback?(source, namespaces) ⇒ Boolean
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Determine whether libxml_sax must fall back to the DOM parser
65 66 67 |
# File 'lib/multi_xml/parsers/libxml_sax.rb', line 65 def dom_fallback?(source, namespaces) namespaces != :strip || stripped_attribute_collision?(source) end |
.parse(xml, namespaces: :strip) ⇒ Hash
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Parse XML from a string or IO object
27 28 29 30 31 32 33 34 |
# File 'lib/multi_xml/parsers/libxml_sax.rb', line 27 def parse(xml, namespaces: :strip) source = xml.respond_to?(:read) ? xml.read : xml.to_s return {} if source.empty? return parse_with_dom(source, namespaces) if dom_fallback?(source, namespaces) parse_with_sax(source, namespaces) end |
.parse_with_dom(source, namespaces) ⇒ Hash
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Parse via the DOM libxml backend
75 76 77 |
# File 'lib/multi_xml/parsers/libxml_sax.rb', line 75 def parse_with_dom(source, namespaces) Libxml.parse(StringIO.new(source), namespaces: namespaces) end |
.parse_with_sax(source, namespaces) ⇒ Hash
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Parse via libxml-ruby's SAX parser
85 86 87 88 89 90 91 92 |
# File 'lib/multi_xml/parsers/libxml_sax.rb', line 85 def parse_with_sax(source, namespaces) LibXML::XML::Error.set_handler(&LibXML::XML::Error::QUIET_HANDLER) handler = Handler.new(namespaces) parser = ::LibXML::XML::SaxParser.io(StringIO.new(source)) parser.callbacks = handler parser.parse handler.result end |
.stripped_attribute_collision?(source) ⇒ Boolean
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Detect whether a start tag has attributes that collide after stripping
41 42 43 44 45 46 |
# File 'lib/multi_xml/parsers/libxml_sax.rb', line 41 def stripped_attribute_collision?(source) source.scan(%r{<(?![!?/])[^>]*>}m).any? do |tag| local_names = attribute_names(tag).map { |name| name.split(":", 2).last } local_names.uniq.length < local_names.length end end |