Class: Xmi::NamespaceDetector

Inherits:
Object
  • Object
show all
Defined in:
lib/xmi/namespace_detector.rb

Overview

Detects namespace versions from XMI XML content

This class parses XMI files to extract namespace URIs and detect the specific versions of XMI, UML, and other OMG specifications used.

Constant Summary collapse

VERSION_PATTERN =
/(\d{8})/
NS_DECL_REGEX =

Regex to extract xmlns declarations without parsing the entire document. Matches both default namespace (xmlns=“…”) and prefixed (xmlns:foo=“…”). Namespace declarations are always on or near the root element, so scanning the first 8KB is sufficient for any XMI file.

/xmlns(?::(\w+))?\s*=\s*["']([^"']+)["']/
NS_SCAN_BYTES =

How many bytes of the XML to scan for namespace declarations

8192
NS_PATTERNS =

Namespace URI patterns for OMG specifications

{
  xmi: %r{http://www\.omg\.org/spec/XMI/(\d{8})},
  uml: %r{http://www\.omg\.org/spec/UML/(\d{8})},
  umldi: %r{http://www\.omg\.org/spec/UML/(\d{8})/UMLDI},
  umldc: %r{http://www\.omg\.org/spec/UML/(\d{8})/UMLDC},
}.freeze

Class Method Summary collapse

Class Method Details

.analyze(xml_content) ⇒ Hash

Get a summary of all namespaces and their versions in the XML

Parameters:

  • xml_content (String)

    The XML content to analyze

Returns:

  • (Hash)

    Detailed namespace information



149
150
151
152
153
154
155
156
157
158
159
160
# File 'lib/xmi/namespace_detector.rb', line 149

def self.analyze(xml_content)
  versions = detect_versions(xml_content)
  uris = detect_namespace_uris(xml_content)
  raw_namespaces = extract_namespace_uris_full(xml_content)

  {
    versions: versions,
    uris: uris,
    raw_namespaces: raw_namespaces,
    normalized_needed: normalization_needed?(versions),
  }
end

.build_namespace_uri(type, version) ⇒ String?

Get the full namespace URI for a detected version

Parameters:

  • type (Symbol)

    The namespace type (:xmi, :uml, etc.)

  • version (String)

    The version string (e.g., “20131001”)

Returns:

  • (String, nil)

    The full namespace URI



101
102
103
104
105
106
107
108
109
110
111
112
# File 'lib/xmi/namespace_detector.rb', line 101

def self.build_namespace_uri(type, version)
  case type
  when :xmi
    "http://www.omg.org/spec/XMI/#{version}"
  when :uml
    "http://www.omg.org/spec/UML/#{version}"
  when :umldi
    "http://www.omg.org/spec/UML/#{version}/UMLDI"
  when :umldc
    "http://www.omg.org/spec/UML/#{version}/UMLDC"
  end
end

.detect_namespace_uris(xml_content) ⇒ Hash

Get detected namespace URIs for all detected versions

Parameters:

  • xml_content (String)

    The XML content to parse

Returns:

  • (Hash)

    A hash with namespace types as keys and URIs as values



118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# File 'lib/xmi/namespace_detector.rb', line 118

def self.detect_namespace_uris(xml_content)
  versions = detect_versions(xml_content)
  {
    xmi: versions[:xmi] ? build_namespace_uri(:xmi, versions[:xmi]) : nil,
    uml: versions[:uml] ? build_namespace_uri(:uml, versions[:uml]) : nil,
    umldi: if versions[:umldi]
             build_namespace_uri(:umldi,
                                 versions[:umldi])
           end,
    umldc: if versions[:umldc]
             build_namespace_uri(:umldc,
                                 versions[:umldc])
           end,
  }
end

.detect_version(namespaces, type) ⇒ String?

Detect version for a specific namespace type

Parameters:

  • namespaces (Hash<String, String>)

    The namespace URIs hash

  • type (Symbol)

    The namespace type (:xmi, :uml, :umldi, :umldc)

Returns:

  • (String, nil)

    The version string (e.g., “20131001”) or nil if not found



84
85
86
87
88
89
90
91
92
93
94
# File 'lib/xmi/namespace_detector.rb', line 84

def self.detect_version(namespaces, type)
  pattern = NS_PATTERNS[type]
  return nil unless pattern

  namespaces.each_value do |uri|
    match = uri.match(pattern)
    return match[1] if match
  end

  nil
end

.detect_versions(xml_content) ⇒ Hash

Detect all namespace versions from XML content

Parameters:

  • xml_content (String)

    The XML content to parse

Returns:

  • (Hash)

    A hash with namespace types as keys and version strings as values Example: { xmi: “20131001”, uml: “20131001”, umldi: nil, umldc: nil }



35
36
37
38
39
40
41
42
43
# File 'lib/xmi/namespace_detector.rb', line 35

def self.detect_versions(xml_content)
  namespaces = extract_namespace_uris(xml_content)
  {
    xmi: detect_version(namespaces, :xmi),
    uml: detect_version(namespaces, :uml),
    umldi: detect_version(namespaces, :umldi),
    umldc: detect_version(namespaces, :umldc),
  }
end

.extract_namespace_uris(xml_content) ⇒ Hash<String, String>

Extract all namespace URIs from XML content using regex on the first 8KB.

This avoids a full Nokogiri parse — namespace declarations are always on or near the root element, so scanning the first few KB is sufficient and ~10x faster than parsing a 3.5MB document.

Parameters:

  • xml_content (String)

    The XML content

Returns:

  • (Hash<String, String>)

    A hash mapping prefixes to namespace URIs



53
54
55
56
57
58
59
60
61
62
63
64
65
# File 'lib/xmi/namespace_detector.rb', line 53

def self.extract_namespace_uris(xml_content)
  head = xml_content.byteslice(0, NS_SCAN_BYTES)
  unless head.valid_encoding?
    head = head.encode("UTF-8", invalid: :replace,
                                undef: :replace)
  end
  result = {}
  head.scan(NS_DECL_REGEX) do |prefix, uri|
    key = prefix.nil? ? "xmlns" : prefix
    result[key] = uri unless result.key?(key)
  end
  result
end

.extract_namespace_uris_full(xml_content) ⇒ Hash<String, String>

Extract namespace URIs via Nokogiri (full parse). Used by ‘analyze` when the complete namespace map is needed.

Parameters:

  • xml_content (String)

    The XML content

Returns:

  • (Hash<String, String>)

    A hash mapping prefixes to namespace URIs



72
73
74
75
76
77
# File 'lib/xmi/namespace_detector.rb', line 72

def self.extract_namespace_uris_full(xml_content)
  doc = Nokogiri::XML(xml_content, &:noent)
  doc.collect_namespaces
rescue Nokogiri::XML::SyntaxError
  {}
end

.normalization_needed?(versions) ⇒ Boolean

Check if namespace normalization is needed

Parameters:

  • versions (Hash)

    The detected versions hash

Returns:

  • (Boolean)

    True if any namespace is not 20131001



166
167
168
# File 'lib/xmi/namespace_detector.rb', line 166

def self.normalization_needed?(versions)
  versions.values.any? { |v| v && v != "20131001" }
end

.uses_version?(xml_content, type, version) ⇒ Boolean

Check if the XML uses a specific namespace version

Parameters:

  • xml_content (String)

    The XML content to check

  • type (Symbol)

    The namespace type

  • version (String)

    The version to check for

Returns:

  • (Boolean)

    True if the XML uses the specified version



140
141
142
143
# File 'lib/xmi/namespace_detector.rb', line 140

def self.uses_version?(xml_content, type, version)
  detected = detect_versions(xml_content)
  detected[type] == version
end