Class: Canon::Diff::DiffClassifier

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/diff/diff_classifier.rb

Overview

Classifies DiffNodes as normative (affects equivalence) or informative (doesn’t affect equivalence) based on the match options in effect

Classification hierarchy (three distinct kinds of differences):

  1. Serialization formatting: XML syntax differences (always non-normative)

  2. Content formatting: Whitespace differences in content (non-normative when normalized)

  3. Normative: Semantic content differences (affect equivalence)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(match_options) ⇒ DiffClassifier

Returns a new instance of DiffClassifier.

Parameters:



21
22
23
24
25
26
27
28
29
30
# File 'lib/canon/diff/diff_classifier.rb', line 21

def initialize(match_options)
  @match_options = match_options
  # Use the compare_profile from ResolvedMatchOptions if available (e.g., HtmlCompareProfile)
  # Otherwise create a base CompareProfile
  @profile = if match_options.respond_to?(:compare_profile) && match_options.compare_profile
               match_options.compare_profile
             else
               Canon::Comparison::CompareProfile.new(match_options)
             end
end

Instance Attribute Details

#match_optionsObject (readonly)

Returns the value of attribute match_options.



18
19
20
# File 'lib/canon/diff/diff_classifier.rb', line 18

def match_options
  @match_options
end

#profileObject (readonly)

Returns the value of attribute profile.



18
19
20
# File 'lib/canon/diff/diff_classifier.rb', line 18

def profile
  @profile
end

Instance Method Details

#classify(diff_node) ⇒ DiffNode

Classify a single DiffNode as normative or informative Hierarchy: formatting-only < informative < normative CompareProfile determines base classification, XmlSerializationFormatter handles serialization formatting

Parameters:

  • diff_node (DiffNode)

    The diff node to classify

Returns:

  • (DiffNode)

    The same diff node with normative/formatting attributes set



37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/canon/diff/diff_classifier.rb', line 37

def classify(diff_node)
  # FIRST: Check for XML serialization-level formatting differences
  # These are ALWAYS non-normative (formatting-only) regardless of match options
  # Examples: self-closing tags (<tag/>) vs explicit closing tags (<tag></tag>)
  #
  # EXCEPTION: If the text node is inside a whitespace-sensitive element
  # (:preserve or :collapse), don't dismiss as serialization formatting
  # because whitespace presence is meaningful in those elements.
  if !inside_whitespace_sensitive_element?(diff_node) &&
      XmlSerializationFormatter.serialization_formatting?(diff_node)
    diff_node.formatting = true
    diff_node.normative = false
    return diff_node
  end

  # SECOND: Handle content-level formatting for text_content with :normalize behavior
  # When text_content is :normalize and the difference is formatting-only,
  # it should be marked as non-normative (informative)
  # This ensures that verbose and non-verbose modes give consistent results
  #
  # EXCEPTION: If the text node is inside a PRESERVE whitespace element
  # (like <pre>, <code>, <textarea> in HTML), don't apply formatting detection
  # because whitespace should be preserved exactly in these elements.
  # Note: COLLAPSE elements like <p> DO get formatting detection because
  # their whitespace IS normalized (differences are formatting-only).
  #
  # This check must come BEFORE normative_dimension? is called,
  # because normative_dimension? returns true for text_content: :normalize
  # (since the dimension affects equivalence), which would prevent formatting
  # detection from being applied.
  if diff_node.dimension == :text_content &&
      profile.send(:behavior_for, :text_content) == :normalize &&
      !inside_preserve_element?(diff_node) &&
      formatting_only_diff?(diff_node)
    diff_node.formatting = true
    diff_node.normative = false
    return diff_node
  end

  # THIRD: Determine if this dimension is normative based on CompareProfile
  # This respects the policy settings (strict/normalize/ignore)
  is_normative = profile.normative_dimension?(diff_node.dimension)

  # FOURTH: Check if FormattingDetector should be consulted for non-normative dimensions
  # Only check for formatting-only when dimension is NOT normative
  # This ensures strict mode differences remain normative
  should_check_formatting = !is_normative &&
    profile.supports_formatting_detection?(diff_node.dimension)

  # If we should check formatting, see if it's formatting-only
  if should_check_formatting && formatting_only_diff?(diff_node)
    diff_node.formatting = true
    diff_node.normative = false
    return diff_node
  end

  # FIFTH: Apply the normative determination from CompareProfile
  diff_node.formatting = false
  diff_node.normative = is_normative

  diff_node
end

#classify_all(diff_nodes) ⇒ Array<DiffNode>

Classify multiple DiffNodes

Parameters:

  • diff_nodes (Array<DiffNode>)

    The diff nodes to classify

Returns:

  • (Array<DiffNode>)

    The same diff nodes with normative attributes set



103
104
105
# File 'lib/canon/diff/diff_classifier.rb', line 103

def classify_all(diff_nodes)
  diff_nodes.each { |node| classify(node) }
end