Class: Canon::Diff::DiffClassifier

Inherits:

Object

Object
Canon::Diff::DiffClassifier

show all

Defined in:: lib/canon/diff/diff_classifier.rb

Overview

Classifies DiffNodes as normative (affects equivalence) or informative (doesn’t affect equivalence) based on the match options in effect

Classification hierarchy (three distinct kinds of differences):

Serialization formatting: XML syntax differences (always non-normative)
Content formatting: Whitespace differences in content (non-normative when normalized)
Normative: Semantic content differences (affect equivalence)

Instance Attribute Summary collapse

#match_options ⇒ Object readonly

Returns the value of attribute match_options.
#profile ⇒ Object readonly

Returns the value of attribute profile.

Instance Method Summary collapse

#classify(diff_node) ⇒ DiffNode

Classify a single DiffNode as normative or informative Hierarchy: formatting-only < informative < normative CompareProfile determines base classification, XmlSerializationFormatter handles serialization formatting.
#classify_all(diff_nodes) ⇒ Array<DiffNode>

Classify multiple DiffNodes.
#initialize(match_options) ⇒ DiffClassifier constructor

A new instance of DiffClassifier.

Constructor Details

#initialize(match_options) ⇒ `DiffClassifier`

Returns a new instance of DiffClassifier.

Parameters:

match_options (Canon::Comparison::ResolvedMatchOptions) —

The match options

# File 'lib/canon/diff/diff_classifier.rb', line 21

def initialize(match_options)
  @match_options = match_options
  # Use the compare_profile from ResolvedMatchOptions if available (e.g., HtmlCompareProfile)
  # Otherwise create a base CompareProfile
  @profile = if match_options.respond_to?(:compare_profile) && match_options.compare_profile
               match_options.compare_profile
             else
               Canon::Comparison::CompareProfile.new(match_options)
             end
end

Instance Attribute Details

#match_options ⇒ `Object` (readonly)

Returns the value of attribute match_options.



18
19
20

# File 'lib/canon/diff/diff_classifier.rb', line 18

def match_options
  @match_options
end

#profile ⇒ `Object` (readonly)

Returns the value of attribute profile.



18
19
20

# File 'lib/canon/diff/diff_classifier.rb', line 18

def profile
  @profile
end

Instance Method Details

#classify(diff_node) ⇒ `DiffNode`

Classify a single DiffNode as normative or informative Hierarchy: formatting-only < informative < normative CompareProfile determines base classification, XmlSerializationFormatter handles serialization formatting

Parameters:

diff_node (DiffNode) —

The diff node to classify

Returns:

(DiffNode) —

The same diff node with normative/formatting attributes set

# File 'lib/canon/diff/diff_classifier.rb', line 37

def classify(diff_node)
  # FIRST: Check for XML serialization-level formatting differences
  # These are ALWAYS non-normative (formatting-only) regardless of match options
  # Examples: self-closing tags (<tag/>) vs explicit closing tags (<tag></tag>)
  #
  # EXCEPTION: If the text node is inside a whitespace-sensitive element
  # (:preserve or :collapse), don't dismiss as serialization formatting
  # because whitespace presence is meaningful in those elements.
  if !inside_whitespace_sensitive_element?(diff_node) &&
      XmlSerializationFormatter.serialization_formatting?(diff_node)
    diff_node.formatting = true
    diff_node.normative = false
    return diff_node
  end

  # SECOND: Handle content-level formatting for text_content with :normalize behavior
  # When text_content is :normalize and the difference is formatting-only,
  # it should be marked as non-normative (informative)
  # This ensures that verbose and non-verbose modes give consistent results
  #
  # EXCEPTION: If the text node is inside a PRESERVE whitespace element
  # (like <pre>, <code>, <textarea> in HTML), don't apply formatting detection
  # because whitespace should be preserved exactly in these elements.
  # Note: COLLAPSE elements like <p> DO get formatting detection because
  # their whitespace IS normalized (differences are formatting-only).
  #
  # This check must come BEFORE normative_dimension? is called,
  # because normative_dimension? returns true for text_content: :normalize
  # (since the dimension affects equivalence), which would prevent formatting
  # detection from being applied.
  if diff_node.dimension == :text_content &&
      profile.send(:behavior_for, :text_content) == :normalize &&
      !inside_preserve_element?(diff_node) &&
      formatting_only_diff?(diff_node)
    diff_node.formatting = true
    diff_node.normative = false
    return diff_node
  end

  # THIRD: Determine if this dimension is normative based on CompareProfile
  # This respects the policy settings (strict/normalize/ignore)
  is_normative = profile.normative_dimension?(diff_node.dimension)

  # FOURTH: Check if FormattingDetector should be consulted for non-normative dimensions
  # Only check for formatting-only when dimension is NOT normative
  # This ensures strict mode differences remain normative
  should_check_formatting = !is_normative &&
    profile.supports_formatting_detection?(diff_node.dimension)

  # If we should check formatting, see if it's formatting-only
  if should_check_formatting && formatting_only_diff?(diff_node)
    diff_node.formatting = true
    diff_node.normative = false
    return diff_node
  end

  # FIFTH: Apply the normative determination from CompareProfile
  diff_node.formatting = false
  diff_node.normative = is_normative

  diff_node
end

#classify_all(diff_nodes) ⇒ `Array<DiffNode>`

Classify multiple DiffNodes

Parameters:

diff_nodes (Array<DiffNode>) —

The diff nodes to classify

Returns:

(Array<DiffNode>) —

The same diff nodes with normative attributes set



103
104
105

# File 'lib/canon/diff/diff_classifier.rb', line 103

def classify_all(diff_nodes)
  diff_nodes.each { |node| classify(node) }
end

Class: Canon::Diff::DiffClassifier

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(match_options) ⇒ DiffClassifier

Instance Attribute Details

#match_options ⇒ Object (readonly)

#profile ⇒ Object (readonly)

Instance Method Details

#classify(diff_node) ⇒ DiffNode

#classify_all(diff_nodes) ⇒ Array<DiffNode>

#initialize(match_options) ⇒ `DiffClassifier`

#match_options ⇒ `Object` (readonly)

#profile ⇒ `Object` (readonly)

#classify(diff_node) ⇒ `DiffNode`

#classify_all(diff_nodes) ⇒ `Array<DiffNode>`