Class: Canon::Diff::FormattingDetector

Inherits:

Object

Object
Canon::Diff::FormattingDetector

show all

Defined in:: lib/canon/diff/formatting_detector.rb

Overview

Detects if differences between lines are formatting-only (whitespace, line breaks) with no semantic content changes

Constant Summary collapse

CASE_INSENSITIVE_ATTRS = Attribute names whose values are case-insensitive per XML/XHTML specs. Per the XML specification, the encoding declaration value is case-insensitive (e.g., “UTF-8” equals “utf-8”). The standalone declaration in XML 1.1 is also case-insensitive.

%w[encoding standalone].freeze

QUOTE_CHARS =

["\"", "'"].freeze

SKIP_CHARS =

[" ", "="].freeze

Class Method Summary collapse

.blank?(line) ⇒ Boolean

Check if a line is blank (nil or whitespace-only).
.formatting_block?(old_parts, new_parts) ⇒ Boolean

Detect if a block of consecutive line changes is formatting-only.
.formatting_only?(line1, line2) ⇒ Boolean

Detect if two lines differ only in formatting.
.formatting_prefix(old_parts, new_parts) ⇒ Hash^?

Find the largest formatting-only prefix within old/new parts.
.normalize_for_comparison(line) ⇒ String

Aggressive normalization for formatting comparison.

Class Method Details

.blank?(line) ⇒ `Boolean`

Check if a line is blank (nil or whitespace-only)

Parameters:

line (String, nil) —

Line to check

Returns:

(Boolean) —

true if blank



54
55
56

# File 'lib/canon/diff/formatting_detector.rb', line 54

def self.blank?(line)
  line.nil? || line.strip.empty?
end

.formatting_block?(old_parts, new_parts) ⇒ `Boolean`

Detect if a block of consecutive line changes is formatting-only. Joins old and new parts with spaces and compares as a whole. Handles multi-line tag wrapping (e.g., a tag on 2 lines vs 1 line).

Parameters:

old_parts (Array<String>) —

Old line contents in the block
new_parts (Array<String>) —

New line contents in the block

Returns:

(Boolean) —

true if the joined content differs only in formatting

# File 'lib/canon/diff/formatting_detector.rb', line 65

def self.formatting_block?(old_parts, new_parts)
  return false if old_parts.empty? || new_parts.empty?

  formatting_only?(old_parts.join(" "), new_parts.join(" "))
end

.formatting_only?(line1, line2) ⇒ `Boolean`

Detect if two lines differ only in formatting

Parameters:

line1 (String, nil) —

First line to compare
line2 (String, nil) —

Second line to compare

Returns:

(Boolean) —

true if lines differ only in formatting

# File 'lib/canon/diff/formatting_detector.rb', line 13

def self.formatting_only?(line1, line2)
  # If both are nil or empty, not a formatting diff (no difference)
  return false if blank?(line1) && blank?(line2)

  # If only one is blank, it's not just formatting
  return false if blank?(line1) || blank?(line2)

  # Compare normalized versions
  normalize_for_comparison(line1) == normalize_for_comparison(line2)
end

.formatting_prefix(old_parts, new_parts) ⇒ `Hash`^?

Find the largest formatting-only prefix within old/new parts. Tries all (old_end, new_end) combinations and returns the one with the most old parts. Handles mixed-element blocks where the first element is formatting but later elements are not.

Parameters:

old_parts (Array<String>) —

Old line contents
new_parts (Array<String>) —

New line contents

Returns:

(Hash, nil) —

{ old_end:, new_end: } or nil

# File 'lib/canon/diff/formatting_detector.rb', line 79

def self.formatting_prefix(old_parts, new_parts)
  best = nil

  (1..old_parts.length).each do |old_end|
    (1..new_parts.length).each do |new_end|
      if formatting_only?(old_parts[0...old_end].join(" "),
                          new_parts[0...new_end].join(" "))
        best = { old_end: old_end, new_end: new_end }
      end
    end
  end

  best
end

.normalize_for_comparison(line) ⇒ `String`

Aggressive normalization for formatting comparison. Collapses whitespace, decodes entities, normalizes attribute order, and strips tag-delimiter whitespace.

Parameters:

line (String, nil) —

Line to normalize

Returns:

(String) —

Normalized line

# File 'lib/canon/diff/formatting_detector.rb', line 30

def self.normalize_for_comparison(line)
  return "" if line.nil?

  # Decode XML entities so &#x2014; and — compare as equal
  decoded = decode_xml_entities(line)

  # Collapse all whitespace (spaces, tabs, newlines) to single space
  # Avoid regex to prevent ReDoS vulnerability - use String methods
  normalized = decoded.strip.tr("\t\n\r\f\v", " ").squeeze(" ")

  # Normalize attribute order within tags.
  # For each tag (e.g., <std-id type="dated" id="foo">), sort attributes
  # so that attribute-order-only differences are treated as formatting.
  normalized = normalize_attribute_order(normalized)

  # Normalize whitespace around tag delimiters
  # Remove spaces before > and after < (avoid regex for ReDoS safety)
  normalize_attribute_order(normalized).gsub(" >", ">").gsub("< ", "<")
end

Class: Canon::Diff::FormattingDetector

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.blank?(line) ⇒ Boolean

.formatting_block?(old_parts, new_parts) ⇒ Boolean

.formatting_only?(line1, line2) ⇒ Boolean

.formatting_prefix(old_parts, new_parts) ⇒ Hash?

.normalize_for_comparison(line) ⇒ String

.blank?(line) ⇒ `Boolean`

.formatting_block?(old_parts, new_parts) ⇒ `Boolean`

.formatting_only?(line1, line2) ⇒ `Boolean`

.formatting_prefix(old_parts, new_parts) ⇒ `Hash`^?

.normalize_for_comparison(line) ⇒ `String`