Module: Lutaml::Xml::DocTypeExtractor
- Included in:
- Adapter::NokogiriAdapter, Adapter::OgaAdapter, Adapter::OxAdapter
- Defined in:
- lib/lutaml/xml/doctype_extractor.rb
Overview
Extracts DOCTYPE information from raw XML strings
This module provides a shared method to extract DOCTYPE declarations from raw XML strings when the XML library doesn’t directly expose this information (as is the case with Moxml/Oga and Ox).
Nokogiri provides native access to DOCTYPE via ‘parsed.internal_subset`, so it doesn’t need this extraction method.
This logic is identical in both Oga and Ox adapters and has been extracted here to maintain DRY principles.
Instance Method Summary collapse
-
#extract_doctype_from_xml(xml) ⇒ Hash?
Extract DOCTYPE information from raw XML string.
Instance Method Details
#extract_doctype_from_xml(xml) ⇒ Hash?
Extract DOCTYPE information from raw XML string
Parses the DOCTYPE declaration using a regex pattern to extract:
-
Document type name
-
Public identifier (if PUBLIC doctype)
-
System identifier (external DTD location)
39 40 41 42 43 44 45 46 47 48 |
# File 'lib/lutaml/xml/doctype_extractor.rb', line 39 def extract_doctype_from_xml(xml) # Match DOCTYPE declaration using regex if xml =~ /<!DOCTYPE\s+(\S+)(?:\s+(PUBLIC|SYSTEM)\s+"([^"]+)"(?:\s+"([^"]+)")?)?\s*>/ { name: $1, public_id: ($2 == "PUBLIC" ? $3 : nil), system_id: ($2 == "PUBLIC" ? $4 : $3), } end end |