Class: Canon::Xml::WhitespaceNormalizer

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/xml/whitespace_normalizer.rb

Overview

Handles whitespace normalization for flexible XML/HTML comparison

Provides methods for normalizing different categories of whitespace:

  1. Indentation whitespace

  2. Inter-element whitespace (between tags)

  3. Text content whitespace (within text nodes)

  4. Tag boundary whitespace (inside tags)

  5. Attribute formatting (handled by existing ignore_attr_order)

Instance Method Summary collapse

Instance Method Details

#flexible_equivalent?(text1, text2) ⇒ Boolean

Check if two text strings are equivalent under flexible whitespace rules

Parameters:

  • text1 (String)

    First text

  • text2 (String)

    Second text

Returns:

  • (Boolean)

    true if equivalent after normalization



67
68
69
# File 'lib/canon/xml/whitespace_normalizer.rb', line 67

def flexible_equivalent?(text1, text2)
  normalize_text_content(text1) == normalize_text_content(text2)
end

#inter_element_whitespace?(node) ⇒ Boolean

Normalize inter-element whitespace (whitespace between tags) This removes whitespace-only text nodes between elements

Parameters:

  • node (Moxml::Node)

    Node to check

Returns:

  • (Boolean)

    true if node is whitespace-only and should be ignored



45
46
47
48
49
50
# File 'lib/canon/xml/whitespace_normalizer.rb', line 45

def inter_element_whitespace?(node)
  return false unless node.respond_to?(:text?) && node.text?

  text = node.respond_to?(:content) ? node.content.to_s : node.text.to_s
  text.strip.empty?
end

#normalize_indentation(text) ⇒ String

Normalize indentation by removing all leading whitespace from each line

Parameters:

  • text (String)

    Text with indentation

Returns:

  • (String)

    Text with indentation removed



31
32
33
34
35
36
37
38
# File 'lib/canon/xml/whitespace_normalizer.rb', line 31

def normalize_indentation(text)
  return "" if text.nil?

  text.to_s
    .lines
    .map(&:lstrip) # Remove leading whitespace from each line
    .join
end

#normalize_tag_boundaries(text) ⇒ String

Normalize tag boundary whitespace This is the same as normalizing text content for now, but kept separate for clarity

Parameters:

  • text (String)

    Text at tag boundary

Returns:

  • (String)

    Normalized text



58
59
60
# File 'lib/canon/xml/whitespace_normalizer.rb', line 58

def normalize_tag_boundaries(text)
  normalize_text_content(text)
end

#normalize_text_content(text) ⇒ String

Normalize text content by collapsing all whitespace sequences to single spaces and trimming leading/trailing whitespace

Parameters:

  • text (String)

    Text to normalize

Returns:

  • (String)

    Normalized text



19
20
21
22
23
24
25
# File 'lib/canon/xml/whitespace_normalizer.rb', line 19

def normalize_text_content(text)
  return "" if text.nil?

  text.to_s
    .gsub(/\s+/, " ")  # Collapse all whitespace sequences to single space
    .strip             # Remove leading/trailing whitespace
end