Class: Xmi::NamespaceDetector
- Inherits:
-
Object
- Object
- Xmi::NamespaceDetector
- Defined in:
- lib/xmi/namespace_detector.rb
Overview
Detects namespace versions from XMI XML content
This class parses XMI files to extract namespace URIs and detect the specific versions of XMI, UML, and other OMG specifications used.
Constant Summary collapse
- VERSION_PATTERN =
/(\d{8})/- NS_DECL_REGEX =
Regex to extract xmlns declarations without parsing the entire document. Matches both default namespace (xmlns=“…”) and prefixed (xmlns:foo=“…”). Namespace declarations are always on or near the root element, so scanning the first 8KB is sufficient for any XMI file.
/xmlns(?::(\w+))?\s*=\s*["']([^"']+)["']/- NS_SCAN_BYTES =
How many bytes of the XML to scan for namespace declarations
8192- NS_PATTERNS =
Namespace URI patterns for OMG specifications
{ xmi: %r{http://www\.omg\.org/spec/XMI/(\d{8})}, uml: %r{http://www\.omg\.org/spec/UML/(\d{8})}, umldi: %r{http://www\.omg\.org/spec/UML/(\d{8})/UMLDI}, umldc: %r{http://www\.omg\.org/spec/UML/(\d{8})/UMLDC}, }.freeze
Class Method Summary collapse
-
.analyze(xml_content) ⇒ Hash
Get a summary of all namespaces and their versions in the XML.
-
.build_namespace_uri(type, version) ⇒ String?
Get the full namespace URI for a detected version.
-
.detect_namespace_uris(xml_content) ⇒ Hash
Get detected namespace URIs for all detected versions.
-
.detect_version(namespaces, type) ⇒ String?
Detect version for a specific namespace type.
-
.detect_versions(xml_content) ⇒ Hash
Detect all namespace versions from XML content.
-
.extract_namespace_uris(xml_content) ⇒ Hash<String, String>
Extract all namespace URIs from XML content using regex on the first 8KB.
-
.extract_namespace_uris_full(xml_content) ⇒ Hash<String, String>
Extract namespace URIs via Nokogiri (full parse).
-
.normalization_needed?(versions) ⇒ Boolean
Check if namespace normalization is needed.
-
.uses_version?(xml_content, type, version) ⇒ Boolean
Check if the XML uses a specific namespace version.
Class Method Details
.analyze(xml_content) ⇒ Hash
Get a summary of all namespaces and their versions in the XML
149 150 151 152 153 154 155 156 157 158 159 160 |
# File 'lib/xmi/namespace_detector.rb', line 149 def self.analyze(xml_content) versions = detect_versions(xml_content) uris = detect_namespace_uris(xml_content) raw_namespaces = extract_namespace_uris_full(xml_content) { versions: versions, uris: uris, raw_namespaces: raw_namespaces, normalized_needed: normalization_needed?(versions), } end |
.build_namespace_uri(type, version) ⇒ String?
Get the full namespace URI for a detected version
101 102 103 104 105 106 107 108 109 110 111 112 |
# File 'lib/xmi/namespace_detector.rb', line 101 def self.build_namespace_uri(type, version) case type when :xmi "http://www.omg.org/spec/XMI/#{version}" when :uml "http://www.omg.org/spec/UML/#{version}" when :umldi "http://www.omg.org/spec/UML/#{version}/UMLDI" when :umldc "http://www.omg.org/spec/UML/#{version}/UMLDC" end end |
.detect_namespace_uris(xml_content) ⇒ Hash
Get detected namespace URIs for all detected versions
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
# File 'lib/xmi/namespace_detector.rb', line 118 def self.detect_namespace_uris(xml_content) versions = detect_versions(xml_content) { xmi: versions[:xmi] ? build_namespace_uri(:xmi, versions[:xmi]) : nil, uml: versions[:uml] ? build_namespace_uri(:uml, versions[:uml]) : nil, umldi: if versions[:umldi] build_namespace_uri(:umldi, versions[:umldi]) end, umldc: if versions[:umldc] build_namespace_uri(:umldc, versions[:umldc]) end, } end |
.detect_version(namespaces, type) ⇒ String?
Detect version for a specific namespace type
84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/xmi/namespace_detector.rb', line 84 def self.detect_version(namespaces, type) pattern = NS_PATTERNS[type] return nil unless pattern namespaces.each_value do |uri| match = uri.match(pattern) return match[1] if match end nil end |
.detect_versions(xml_content) ⇒ Hash
Detect all namespace versions from XML content
35 36 37 38 39 40 41 42 43 |
# File 'lib/xmi/namespace_detector.rb', line 35 def self.detect_versions(xml_content) namespaces = extract_namespace_uris(xml_content) { xmi: detect_version(namespaces, :xmi), uml: detect_version(namespaces, :uml), umldi: detect_version(namespaces, :umldi), umldc: detect_version(namespaces, :umldc), } end |
.extract_namespace_uris(xml_content) ⇒ Hash<String, String>
Extract all namespace URIs from XML content using regex on the first 8KB.
This avoids a full Nokogiri parse — namespace declarations are always on or near the root element, so scanning the first few KB is sufficient and ~10x faster than parsing a 3.5MB document.
53 54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/xmi/namespace_detector.rb', line 53 def self.extract_namespace_uris(xml_content) head = xml_content.byteslice(0, NS_SCAN_BYTES) unless head.valid_encoding? head = head.encode("UTF-8", invalid: :replace, undef: :replace) end result = {} head.scan(NS_DECL_REGEX) do |prefix, uri| key = prefix.nil? ? "xmlns" : prefix result[key] = uri unless result.key?(key) end result end |
.extract_namespace_uris_full(xml_content) ⇒ Hash<String, String>
Extract namespace URIs via Nokogiri (full parse). Used by ‘analyze` when the complete namespace map is needed.
72 73 74 75 76 77 |
# File 'lib/xmi/namespace_detector.rb', line 72 def self.extract_namespace_uris_full(xml_content) doc = Nokogiri::XML(xml_content, &:noent) doc.collect_namespaces rescue Nokogiri::XML::SyntaxError {} end |
.normalization_needed?(versions) ⇒ Boolean
Check if namespace normalization is needed
166 167 168 |
# File 'lib/xmi/namespace_detector.rb', line 166 def self.normalization_needed?(versions) versions.values.any? { |v| v && v != "20131001" } end |
.uses_version?(xml_content, type, version) ⇒ Boolean
Check if the XML uses a specific namespace version
140 141 142 143 |
# File 'lib/xmi/namespace_detector.rb', line 140 def self.uses_version?(xml_content, type, version) detected = detect_versions(xml_content) detected[type] == version end |