Module: Canon::Comparison::WhitespaceSensitivity

Defined in:
lib/canon/comparison/whitespace_sensitivity.rb

Overview

Whitespace sensitivity utilities for element-level control

This module provides three-way classification of whitespace behaviour at the element level:

  • :preserve — every whitespace character is significant. ‘“ ”` ≠ `“n”`. Configured via preserve_whitespace_elements (HTML default: pre, code, textarea, script, style; XML default: none).

  • :collapse — presence ≠ absence, but all whitespace forms are equivalent: ‘“ ”` == `“n ”`. Configured via collapse_whitespace_elements (HTML default: p, li, dt, dd, td, th, h1-h6, caption, figcaption, label, legend, summary, blockquote, address; XML default: none).

  • :strip — all whitespace is structural formatting noise and is dropped. Default for XML; HTML elements not in the above lists.

Classification is ancestor-based: the closest matching ancestor determines the class. The strip blacklist (strip_whitespace_elements) overrides any sensitive ancestor.

Priority Order

  1. respect_xml_space: false → User config only (ignore xml:space)

  2. Ancestor walk (strip blacklist wins; then preserve; then collapse)

  3. xml:space=“preserve” → preserve

  4. xml:space=“default” → use configured behaviour

  5. Format defaults (HTML: collapse for most elements; XML: strip)

Usage

WhitespaceSensitivity.classify_element(element, match_opts)
=> :preserve, :collapse, or :strip

WhitespaceSensitivity.element_sensitive?(node, opts)
=> true if whitespace should be preserved (preserve or collapse)

Constant Summary collapse

HTML_COLLAPSE_ELEMENTS =

HTML mixed-content “leaf block” elements where whitespace presence matters but all forms are equivalent (CSS block whitespace collapsing).

%w[
  p li dt dd td th caption figcaption label legend summary
  h1 h2 h3 h4 h5 h6
  blockquote address button
].freeze
HTML_PRESERVE_ELEMENTS =

HTML elements where every whitespace character is significant.

%w[pre code textarea script style].freeze

Class Method Summary collapse

Class Method Details

.classify_element(element, match_opts) ⇒ Symbol

Classify the whitespace behaviour for an element using ancestor walk.

Parameters:

  • element (Object)

    The element node to classify

  • match_opts (Hash)

    Resolved match options

Returns:

  • (Symbol)

    :preserve, :collapse, or :strip



59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 59

def classify_element(element, match_opts)
  return :strip unless element
  return :strip unless element.respond_to?(:name)

  preserve_set  = resolved_preserve_elements_set(match_opts)
  collapse_set  = resolved_collapse_elements_set(match_opts)
  strip_set = resolved_strip_elements_set(match_opts)

  # Ancestor walk: start at the element itself, walk up.
  # Strip blacklist wins over any sensitive ancestor.
  walk_ancestor_classification(element, preserve_set, collapse_set,
                               strip_set, match_opts)
end

.classify_text_node(node, opts) ⇒ Symbol

Return the whitespace class for a text node used during comparison.

:preserve → preserve all whitespace character-by-character :collapse → preserve presence (normalize to single space) :strip → drop whitespace-only text nodes

Parameters:

  • node (Object)

    Text node to classify

  • opts (Hash)

    Comparison options containing match_opts

Returns:

  • (Symbol)

    :preserve, :collapse, or :strip



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 123

def classify_text_node(node, opts)
  match_opts = opts[:match_opts]
  return :strip unless match_opts
  return :strip unless text_node_parent?(node)

  parent = node.parent

  unless respect_xml_space?(match_opts)
    return user_config_sensitive?(parent,
                                  match_opts) ? :preserve : :strip
  end

  return :preserve if xml_space_preserve?(parent)
  return :strip if xml_space_default?(parent)

  classify_element(parent, match_opts)
end

.default_sensitive_element?(element_name, match_opts) ⇒ Boolean

Check if an element is in the default sensitive list for its format

Parameters:

  • element_name (String, Symbol)

    The element name to check

  • match_opts (Hash)

    Resolved match options

Returns:

  • (Boolean)

    true if element is in default sensitive list



211
212
213
214
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 211

def default_sensitive_element?(element_name, match_opts)
  format_default_preserve_elements(match_opts)
    .include?(element_name.to_sym)
end

.element_sensitive?(node, opts) ⇒ Boolean

Check if an element is whitespace-sensitive based on configuration. Returns true for :preserve or :collapse classification.

Parameters:

  • node (Object)

    The element node to check

  • opts (Hash)

    Comparison options containing match_opts

Returns:

  • (Boolean)

    true if whitespace should be preserved for this element



79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 79

def element_sensitive?(node, opts)
  match_opts = opts[:match_opts]
  return false unless match_opts
  return false unless text_node_parent?(node)

  parent = node.parent

  # 1. Check if we should ignore xml:space (user override)
  unless respect_xml_space?(match_opts)
    return user_config_sensitive?(parent, match_opts)
  end

  # 2. Check xml:space="preserve" (document declaration)
  return true if xml_space_preserve?(parent)

  # 3. Check xml:space="default" (use configured behavior)
  return false if xml_space_default?(parent)

  # 4. Three-way classification (ancestor-based)
  classification = classify_element(parent, match_opts)
  %i[preserve collapse].include?(classification)
end

.format_default_collapse_elements(match_opts) ⇒ Array<Symbol>

Get format-specific default collapse elements.

Parameters:

  • match_opts (Hash)

    Resolved match options

Returns:

  • (Array<Symbol>)

    Default collapse element names



196
197
198
199
200
201
202
203
204
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 196

def format_default_collapse_elements(match_opts)
  format = match_opts[:format] || :xml
  case format
  when :html, :html4, :html5
    HTML_COLLAPSE_ELEMENTS.map(&:to_sym).freeze
  else
    [].freeze
  end
end

.format_default_preserve_elements(match_opts) ⇒ Array<Symbol>

Get format-specific default preserve (exact-whitespace) elements. This is the SINGLE SOURCE OF TRUTH for default preserve-whitespace elements.

Parameters:

  • match_opts (Hash)

    Resolved match options

Returns:

  • (Array<Symbol>)

    Default preserve element names



182
183
184
185
186
187
188
189
190
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 182

def format_default_preserve_elements(match_opts)
  format = match_opts[:format] || :xml
  case format
  when :html, :html4, :html5
    HTML_PRESERVE_ELEMENTS.map(&:to_sym).freeze
  else
    [].freeze
  end
end

.preserve_whitespace_node?(node, opts) ⇒ Boolean

Check if whitespace-only text node should be filtered

Parameters:

  • node (Object)

    The text node to check

  • opts (Hash)

    Comparison options

Returns:

  • (Boolean)

    true if node should be preserved (not filtered)



107
108
109
110
111
112
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 107

def preserve_whitespace_node?(node, opts)
  return false unless node.respond_to?(:parent)
  return false unless node.parent

  element_sensitive?(node, opts)
end

.resolved_collapse_elements(match_opts) ⇒ Array<String>

Get resolved list of collapse whitespace element names (strings).

Parameters:

  • match_opts (Hash)

    Resolved match options

Returns:

  • (Array<String>)

    Collapse element names



173
174
175
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 173

def resolved_collapse_elements(match_opts)
  resolved_collapse_elements_set(match_opts).to_a
end

.resolved_preserve_elements(match_opts) ⇒ Array<String>

Get resolved list of preserve whitespace element names (strings).

Parameters:

  • match_opts (Hash)

    Resolved match options

Returns:

  • (Array<String>)

    Preserve element names



165
166
167
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 165

def resolved_preserve_elements(match_opts)
  resolved_preserve_elements_set(match_opts).to_a
end

.whitespace_preserved?(element, match_opts) ⇒ Boolean

Check if structural whitespace is preserved (not stripped) for an element.

Uses the same priority chain as element_sensitive? / classify_text_node:

1. xml:space="preserve" → always preserved
2. xml:space="default"  → use configured behaviour
3. ancestor-walk classification (strip = dropped)

Parameters:

  • element (Object)

    Element node to check

  • match_opts (Hash)

    Resolved match options

Returns:

  • (Boolean)

    true if whitespace is preserved (not stripped)



151
152
153
154
155
156
157
158
159
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 151

def whitespace_preserved?(element, match_opts)
  if respect_xml_space?(match_opts)
    return true  if xml_space_preserve?(element)
    return false if xml_space_default?(element)
  end

  classification = classify_element(element, match_opts)
  %i[preserve collapse].include?(classification)
end