Module: Canon::Comparison::WhitespaceSensitivity
- Defined in:
- lib/canon/comparison/whitespace_sensitivity.rb
Overview
Whitespace sensitivity utilities for element-level control
This module provides three-way classification of whitespace behaviour at the element level:
-
:preserve — every whitespace character is significant. ‘“ ”` ≠ `“n”`. Configured via
preserve_whitespace_elements(HTML default: pre, code, textarea, script, style; XML default: none). -
:collapse — presence ≠ absence, but all whitespace forms are equivalent: ‘“ ”` == `“n ”`. Configured via
collapse_whitespace_elements(HTML default: p, li, dt, dd, td, th, h1-h6, caption, figcaption, label, legend, summary, blockquote, address; XML default: none). -
:strip — all whitespace is structural formatting noise and is dropped. Default for XML; HTML elements not in the above lists.
Classification is ancestor-based: the closest matching ancestor determines the class. The strip blacklist (strip_whitespace_elements) overrides any sensitive ancestor.
Priority Order
-
respect_xml_space: false → User config only (ignore xml:space)
-
Ancestor walk (strip blacklist wins; then preserve; then collapse)
-
xml:space=“preserve” → preserve
-
xml:space=“default” → use configured behaviour
-
Format defaults (HTML: collapse for most elements; XML: strip)
Usage
WhitespaceSensitivity.classify_element(element, match_opts)
=> :preserve, :collapse, or :strip
WhitespaceSensitivity.element_sensitive?(node, opts)
=> true if whitespace should be preserved (preserve or collapse)
Constant Summary collapse
- HTML_COLLAPSE_ELEMENTS =
HTML mixed-content “leaf block” elements where whitespace presence matters but all forms are equivalent (CSS block whitespace collapsing).
%w[ p li dt dd td th caption figcaption label legend summary h1 h2 h3 h4 h5 h6 blockquote address button ].freeze
- HTML_PRESERVE_ELEMENTS =
HTML elements where every whitespace character is significant.
%w[pre code textarea script style].freeze
Class Method Summary collapse
-
.classify_element(element, match_opts) ⇒ Symbol
Classify the whitespace behaviour for an element using ancestor walk.
-
.classify_text_node(node, opts) ⇒ Symbol
Return the whitespace class for a text node used during comparison.
-
.default_sensitive_element?(element_name, match_opts) ⇒ Boolean
Check if an element is in the default sensitive list for its format.
-
.element_sensitive?(node, opts) ⇒ Boolean
Check if an element is whitespace-sensitive based on configuration.
-
.format_default_collapse_elements(match_opts) ⇒ Array<Symbol>
Get format-specific default collapse elements.
-
.format_default_preserve_elements(match_opts) ⇒ Array<Symbol>
Get format-specific default preserve (exact-whitespace) elements.
-
.preserve_whitespace_node?(node, opts) ⇒ Boolean
Check if whitespace-only text node should be filtered.
-
.resolved_collapse_elements(match_opts) ⇒ Array<String>
Get resolved list of collapse whitespace element names (strings).
-
.resolved_preserve_elements(match_opts) ⇒ Array<String>
Get resolved list of preserve whitespace element names (strings).
-
.whitespace_preserved?(element, match_opts) ⇒ Boolean
Check if structural whitespace is preserved (not stripped) for an element.
Class Method Details
.classify_element(element, match_opts) ⇒ Symbol
Classify the whitespace behaviour for an element using ancestor walk.
59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 59 def classify_element(element, match_opts) return :strip unless element return :strip unless element.respond_to?(:name) preserve_set = resolved_preserve_elements_set(match_opts) collapse_set = resolved_collapse_elements_set(match_opts) strip_set = resolved_strip_elements_set(match_opts) # Ancestor walk: start at the element itself, walk up. # Strip blacklist wins over any sensitive ancestor. walk_ancestor_classification(element, preserve_set, collapse_set, strip_set, match_opts) end |
.classify_text_node(node, opts) ⇒ Symbol
Return the whitespace class for a text node used during comparison.
:preserve → preserve all whitespace character-by-character :collapse → preserve presence (normalize to single space) :strip → drop whitespace-only text nodes
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 123 def classify_text_node(node, opts) match_opts = opts[:match_opts] return :strip unless match_opts return :strip unless text_node_parent?(node) parent = node.parent unless respect_xml_space?(match_opts) return user_config_sensitive?(parent, match_opts) ? :preserve : :strip end return :preserve if xml_space_preserve?(parent) return :strip if xml_space_default?(parent) classify_element(parent, match_opts) end |
.default_sensitive_element?(element_name, match_opts) ⇒ Boolean
Check if an element is in the default sensitive list for its format
211 212 213 214 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 211 def default_sensitive_element?(element_name, match_opts) format_default_preserve_elements(match_opts) .include?(element_name.to_sym) end |
.element_sensitive?(node, opts) ⇒ Boolean
Check if an element is whitespace-sensitive based on configuration. Returns true for :preserve or :collapse classification.
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 79 def element_sensitive?(node, opts) match_opts = opts[:match_opts] return false unless match_opts return false unless text_node_parent?(node) parent = node.parent # 1. Check if we should ignore xml:space (user override) unless respect_xml_space?(match_opts) return user_config_sensitive?(parent, match_opts) end # 2. Check xml:space="preserve" (document declaration) return true if xml_space_preserve?(parent) # 3. Check xml:space="default" (use configured behavior) return false if xml_space_default?(parent) # 4. Three-way classification (ancestor-based) classification = classify_element(parent, match_opts) %i[preserve collapse].include?(classification) end |
.format_default_collapse_elements(match_opts) ⇒ Array<Symbol>
Get format-specific default collapse elements.
196 197 198 199 200 201 202 203 204 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 196 def format_default_collapse_elements(match_opts) format = match_opts[:format] || :xml case format when :html, :html4, :html5 HTML_COLLAPSE_ELEMENTS.map(&:to_sym).freeze else [].freeze end end |
.format_default_preserve_elements(match_opts) ⇒ Array<Symbol>
Get format-specific default preserve (exact-whitespace) elements. This is the SINGLE SOURCE OF TRUTH for default preserve-whitespace elements.
182 183 184 185 186 187 188 189 190 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 182 def format_default_preserve_elements(match_opts) format = match_opts[:format] || :xml case format when :html, :html4, :html5 HTML_PRESERVE_ELEMENTS.map(&:to_sym).freeze else [].freeze end end |
.preserve_whitespace_node?(node, opts) ⇒ Boolean
Check if whitespace-only text node should be filtered
107 108 109 110 111 112 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 107 def preserve_whitespace_node?(node, opts) return false unless node.respond_to?(:parent) return false unless node.parent element_sensitive?(node, opts) end |
.resolved_collapse_elements(match_opts) ⇒ Array<String>
Get resolved list of collapse whitespace element names (strings).
173 174 175 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 173 def resolved_collapse_elements(match_opts) resolved_collapse_elements_set(match_opts).to_a end |
.resolved_preserve_elements(match_opts) ⇒ Array<String>
Get resolved list of preserve whitespace element names (strings).
165 166 167 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 165 def resolved_preserve_elements(match_opts) resolved_preserve_elements_set(match_opts).to_a end |
.whitespace_preserved?(element, match_opts) ⇒ Boolean
Check if structural whitespace is preserved (not stripped) for an element.
Uses the same priority chain as element_sensitive? / classify_text_node:
1. xml:space="preserve" → always preserved
2. xml:space="default" → use configured behaviour
3. ancestor-walk classification (strip = dropped)
151 152 153 154 155 156 157 158 159 |
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 151 def whitespace_preserved?(element, match_opts) if respect_xml_space?(match_opts) return true if xml_space_preserve?(element) return false if xml_space_default?(element) end classification = classify_element(element, match_opts) %i[preserve collapse].include?(classification) end |