Class: Canon::Diff::FormattingDetector
- Inherits:
-
Object
- Object
- Canon::Diff::FormattingDetector
- Defined in:
- lib/canon/diff/formatting_detector.rb
Overview
Detects if differences between lines are formatting-only (whitespace, line breaks) with no semantic content changes
Constant Summary collapse
- CASE_INSENSITIVE_ATTRS =
Attribute names whose values are case-insensitive per XML/XHTML specs. Per the XML specification, the encoding declaration value is case-insensitive (e.g., “UTF-8” equals “utf-8”). The standalone declaration in XML 1.1 is also case-insensitive.
%w[encoding standalone].freeze
- QUOTE_CHARS =
["\"", "'"].freeze
- SKIP_CHARS =
[" ", "="].freeze
Class Method Summary collapse
-
.blank?(line) ⇒ Boolean
Check if a line is blank (nil or whitespace-only).
-
.formatting_block?(old_parts, new_parts) ⇒ Boolean
Detect if a block of consecutive line changes is formatting-only.
-
.formatting_only?(line1, line2) ⇒ Boolean
Detect if two lines differ only in formatting.
-
.formatting_prefix(old_parts, new_parts) ⇒ Hash?
Find the largest formatting-only prefix within old/new parts.
-
.normalize_for_comparison(line) ⇒ String
Aggressive normalization for formatting comparison.
Class Method Details
.blank?(line) ⇒ Boolean
Check if a line is blank (nil or whitespace-only)
54 55 56 |
# File 'lib/canon/diff/formatting_detector.rb', line 54 def self.blank?(line) line.nil? || line.strip.empty? end |
.formatting_block?(old_parts, new_parts) ⇒ Boolean
Detect if a block of consecutive line changes is formatting-only. Joins old and new parts with spaces and compares as a whole. Handles multi-line tag wrapping (e.g., a tag on 2 lines vs 1 line).
65 66 67 68 69 |
# File 'lib/canon/diff/formatting_detector.rb', line 65 def self.formatting_block?(old_parts, new_parts) return false if old_parts.empty? || new_parts.empty? formatting_only?(old_parts.join(" "), new_parts.join(" ")) end |
.formatting_only?(line1, line2) ⇒ Boolean
Detect if two lines differ only in formatting
13 14 15 16 17 18 19 20 21 22 |
# File 'lib/canon/diff/formatting_detector.rb', line 13 def self.formatting_only?(line1, line2) # If both are nil or empty, not a formatting diff (no difference) return false if blank?(line1) && blank?(line2) # If only one is blank, it's not just formatting return false if blank?(line1) || blank?(line2) # Compare normalized versions normalize_for_comparison(line1) == normalize_for_comparison(line2) end |
.formatting_prefix(old_parts, new_parts) ⇒ Hash?
Find the largest formatting-only prefix within old/new parts. Tries all (old_end, new_end) combinations and returns the one with the most old parts. Handles mixed-element blocks where the first element is formatting but later elements are not.
79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
# File 'lib/canon/diff/formatting_detector.rb', line 79 def self.formatting_prefix(old_parts, new_parts) best = nil (1..old_parts.length).each do |old_end| (1..new_parts.length).each do |new_end| if formatting_only?(old_parts[0...old_end].join(" "), new_parts[0...new_end].join(" ")) best = { old_end: old_end, new_end: new_end } end end end best end |
.normalize_for_comparison(line) ⇒ String
Aggressive normalization for formatting comparison. Collapses whitespace, decodes entities, normalizes attribute order, and strips tag-delimiter whitespace.
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/canon/diff/formatting_detector.rb', line 30 def self.normalize_for_comparison(line) return "" if line.nil? # Decode XML entities so — and — compare as equal decoded = decode_xml_entities(line) # Collapse all whitespace (spaces, tabs, newlines) to single space # Avoid regex to prevent ReDoS vulnerability - use String methods normalized = decoded.strip.tr("\t\n\r\f\v", " ").squeeze(" ") # Normalize attribute order within tags. # For each tag (e.g., <std-id type="dated" id="foo">), sort attributes # so that attribute-order-only differences are treated as formatting. normalized = normalize_attribute_order(normalized) # Normalize whitespace around tag delimiters # Remove spaces before > and after < (avoid regex for ReDoS safety) normalize_attribute_order(normalized).gsub(" >", ">").gsub("< ", "<") end |