Module: Canon::Comparison::WhitespaceSensitivity

Defined in:
lib/canon/comparison/whitespace_sensitivity.rb

Overview

Whitespace sensitivity utilities for element-level control

This module provides logic to determine whether whitespace should be preserved during comparison based on:

  • Format-specific defaults (HTML has built-in sensitive elements)

  • User-configured whitelist (elements that care about whitespace)

  • User-configured blacklist (elements that don’t care about whitespace)

  • xml:space attribute in the document itself

  • respect_xml_space flag (whether to honor or override xml:space)

Priority Order

  1. respect_xml_space: false → User config only (ignore xml:space)

  2. User whitelist → Use whitelist (user explicitly declared)

  3. Format defaults → HTML: [:pre, :textarea, :script, :style], XML: []

  4. User blacklist → Remove from defaults/whitelist

  5. xml:space=“preserve” → Element is sensitive

  6. xml:space=“default” → Use steps 1-4

Usage

WhitespaceSensitivity.element_sensitive?(node, opts)
=> true if whitespace should be preserved for this element

Class Method Summary collapse

Class Method Details

.default_sensitive_element?(element_name, match_opts) ⇒ Boolean

Check if an element is in the default sensitive list for its format

Convenience method for checking element sensitivity without building the full list first.

Parameters:

  • element_name (String, Symbol)

    The element name to check

  • match_opts (Hash)

    Resolved match options

Returns:

  • (Boolean)

    true if element is in default sensitive list



99
100
101
102
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 99

def default_sensitive_element?(element_name, match_opts)
  format_default_sensitive_elements(match_opts)
    .include?(element_name.to_sym)
end

.element_sensitive?(node, opts) ⇒ Boolean

Check if an element is whitespace-sensitive based on configuration

Parameters:

  • node (Object)

    The element node to check

  • opts (Hash)

    Comparison options containing match_opts

Returns:

  • (Boolean)

    true if whitespace should be preserved for this element



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 35

def element_sensitive?(node, opts)
  match_opts = opts[:match_opts]
  return false unless match_opts
  return false unless text_node_parent?(node)

  parent = node.parent

  # 1. Check if we should ignore xml:space (user override)
  if !respect_xml_space?(match_opts)
    return user_config_sensitive?(parent, match_opts)
  end

  # 2. Check xml:space="preserve" (document declaration)
  return true if xml_space_preserve?(parent)

  # 3. Check xml:space="default" (use configured behavior)
  return false if xml_space_default?(parent)

  # 4. Use user configuration + format defaults
  configured_sensitive?(parent, match_opts)
end

.format_default_sensitive_elements(match_opts) ⇒ Array<Symbol>

Get format-specific default sensitive elements

This is the SINGLE SOURCE OF TRUTH for default whitespace-sensitive elements. All other code should use this method to get the list.

Parameters:

  • match_opts (Hash)

    Resolved match options

Returns:

  • (Array<Symbol>)

    Default sensitive element names



76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 76

def format_default_sensitive_elements(match_opts)
  format = match_opts[:format] || :xml

  case format
  when :html, :html4, :html5
    # HTML specification: these elements preserve whitespace
    %i[pre code textarea script style].freeze
  when :xml
    # XML has no default sensitive elements - purely user-controlled
    [].freeze
  else
    [].freeze
  end
end

.preserve_whitespace_node?(node, opts) ⇒ Boolean

Check if whitespace-only text node should be filtered

Parameters:

  • node (Object)

    The text node to check

  • opts (Hash)

    Comparison options

Returns:

  • (Boolean)

    true if node should be preserved (not filtered)



62
63
64
65
66
67
# File 'lib/canon/comparison/whitespace_sensitivity.rb', line 62

def preserve_whitespace_node?(node, opts)
  return false unless node.respond_to?(:parent)
  return false unless node.parent

  element_sensitive?(node, opts)
end