Class: Canon::Xml::WhitespaceNormalizer
- Inherits:
-
Object
- Object
- Canon::Xml::WhitespaceNormalizer
- Defined in:
- lib/canon/xml/whitespace_normalizer.rb
Overview
Handles whitespace normalization for flexible XML/HTML comparison
Provides methods for normalizing different categories of whitespace:
-
Indentation whitespace
-
Inter-element whitespace (between tags)
-
Text content whitespace (within text nodes)
-
Tag boundary whitespace (inside tags)
-
Attribute formatting (handled by existing ignore_attr_order)
Instance Method Summary collapse
-
#flexible_equivalent?(text1, text2) ⇒ Boolean
Check if two text strings are equivalent under flexible whitespace rules.
-
#inter_element_whitespace?(node) ⇒ Boolean
Normalize inter-element whitespace (whitespace between tags) This removes whitespace-only text nodes between elements.
-
#normalize_indentation(text) ⇒ String
Normalize indentation by removing all leading whitespace from each line.
-
#normalize_tag_boundaries(text) ⇒ String
Normalize tag boundary whitespace This is the same as normalizing text content for now, but kept separate for clarity.
-
#normalize_text_content(text) ⇒ String
Normalize text content by collapsing all whitespace sequences to single spaces and trimming leading/trailing whitespace.
Instance Method Details
#flexible_equivalent?(text1, text2) ⇒ Boolean
Check if two text strings are equivalent under flexible whitespace rules
67 68 69 |
# File 'lib/canon/xml/whitespace_normalizer.rb', line 67 def flexible_equivalent?(text1, text2) normalize_text_content(text1) == normalize_text_content(text2) end |
#inter_element_whitespace?(node) ⇒ Boolean
Normalize inter-element whitespace (whitespace between tags) This removes whitespace-only text nodes between elements
45 46 47 48 49 50 |
# File 'lib/canon/xml/whitespace_normalizer.rb', line 45 def inter_element_whitespace?(node) return false unless node.respond_to?(:text?) && node.text? text = node.respond_to?(:content) ? node.content.to_s : node.text.to_s text.strip.empty? end |
#normalize_indentation(text) ⇒ String
Normalize indentation by removing all leading whitespace from each line
31 32 33 34 35 36 37 38 |
# File 'lib/canon/xml/whitespace_normalizer.rb', line 31 def normalize_indentation(text) return "" if text.nil? text.to_s .lines .map(&:lstrip) # Remove leading whitespace from each line .join end |
#normalize_tag_boundaries(text) ⇒ String
Normalize tag boundary whitespace This is the same as normalizing text content for now, but kept separate for clarity
58 59 60 |
# File 'lib/canon/xml/whitespace_normalizer.rb', line 58 def normalize_tag_boundaries(text) normalize_text_content(text) end |
#normalize_text_content(text) ⇒ String
Normalize text content by collapsing all whitespace sequences to single spaces and trimming leading/trailing whitespace
19 20 21 22 23 24 25 |
# File 'lib/canon/xml/whitespace_normalizer.rb', line 19 def normalize_text_content(text) return "" if text.nil? text.to_s .gsub(/\s+/, " ") # Collapse all whitespace sequences to single space .strip # Remove leading/trailing whitespace end |