Class: Canon::DiffFormatter

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/diff_formatter.rb,
lib/canon/diff_formatter/theme.rb,
lib/canon/diff_formatter/legend.rb,
lib/canon/diff_formatter/debug_output.rb,
lib/canon/diff_formatter/by_line/xml_formatter.rb,
lib/canon/diff_formatter/diff_detail_formatter.rb,
lib/canon/diff_formatter/by_line/base_formatter.rb,
lib/canon/diff_formatter/by_line/html_formatter.rb,
lib/canon/diff_formatter/by_line/json_formatter.rb,
lib/canon/diff_formatter/by_line/yaml_formatter.rb,
lib/canon/diff_formatter/by_object/xml_formatter.rb,
lib/canon/diff_formatter/by_line/simple_formatter.rb,
lib/canon/diff_formatter/by_object/base_formatter.rb,
lib/canon/diff_formatter/by_object/json_formatter.rb,
lib/canon/diff_formatter/by_object/yaml_formatter.rb,
lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb,
lib/canon/diff_formatter/diff_detail_formatter/text_utils.rb,
lib/canon/diff_formatter/diff_detail_formatter/color_helper.rb,
lib/canon/diff_formatter/diff_detail_formatter/location_extractor.rb,
lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb

Overview

Formatter for displaying semantic differences with color support

This is a pure orchestrator class that delegates formatting to mode-specific and format-specific formatters. It provides a unified interface for generating both by-line and by-object diffs across multiple formats (XML, HTML, JSON, YAML).

Architecture

DiffFormatter follows the orchestrator pattern with MECE (Mutually Exclusive, Collectively Exhaustive) delegation:

  1. **Mode Selection**: Chooses by-line or by-object visualization

  2. **Format Delegation**: Dispatches to format-specific formatter

  3. Customization: Applies color, context, and visualization options

Diff Modes

**By-Object Mode** (default for XML/JSON/YAML):

  • Tree-based semantic diff

  • Shows only what changed in the structure

  • Visual tree with box-drawing characters

  • Best for configuration files and structured data

**By-Line Mode** (default for HTML):

  • Traditional line-by-line diff

  • Shows changes in document order with context

  • Syntax-aware token highlighting

  • Best for markup and when line context matters

Visualization Features

  • **Color support**: Red (deletions), green (additions), yellow (structure), cyan (informative)

  • **Whitespace visualization**: Makes invisible characters visible

  • **Context lines**: Shows unchanged lines around changes

  • **Diff grouping**: Groups nearby changes into blocks

  • **Character map customization**: CJK-safe Unicode symbols

Usage

# Basic usage
formatter = Canon::DiffFormatter.new(use_color: true, mode: :by_object)
output = formatter.format(differences, :xml, doc1: xml1, doc2: xml2)

# With options
formatter = Canon::DiffFormatter.new(
  use_color: true,
  mode: :by_line,
  context_lines: 5,
  diff_grouping_lines: 10,
  show_diffs: :normative
)

Defined Under Namespace

Modules: ByLine, ByObject, DebugOutput, DiffDetailFormatter, DiffDetailFormatterHelpers, Legend, Theme

Constant Summary collapse

DEFAULT_VISUALIZATION_MAP =

Default character visualization map (loaded from YAML)

character_map_data[:visualization_map].freeze
CHARACTER_CATEGORY_MAP =

Character category map (loaded from YAML)

character_map_data[:category_map].freeze
CHARACTER_CATEGORY_NAMES =

Category display names (loaded from YAML)

character_map_data[:category_names].freeze
CHARACTER_METADATA =

Character metadata including names (loaded from YAML)

character_map_data[:character_metadata].freeze
DIFF_DESCRIPTIONS =

Map difference codes to human-readable descriptions

{
  Comparison::EQUIVALENT => "Equivalent",
  Comparison::MISSING_ATTRIBUTE => "Missing attribute",
  Comparison::MISSING_NODE => "Missing node",
  Comparison::UNEQUAL_ATTRIBUTES => "Unequal attributes",
  Comparison::UNEQUAL_COMMENTS => "Unequal comments",
  Comparison::UNEQUAL_DOCUMENTS => "Unequal documents",
  Comparison::UNEQUAL_ELEMENTS => "Unequal elements",
  Comparison::UNEQUAL_NODES_TYPES => "Unequal node types",
  Comparison::UNEQUAL_TEXT_CONTENTS => "Unequal text contents",
  Comparison::MISSING_HASH_KEY => "Missing hash key",
  Comparison::UNEQUAL_HASH_VALUES => "Unequal hash values",
  Comparison::UNEQUAL_ARRAY_LENGTHS => "Unequal array lengths",
  Comparison::UNEQUAL_ARRAY_ELEMENTS => "Unequal array elements",
  Comparison::UNEQUAL_TYPES => "Unequal types",
  Comparison::UNEQUAL_PRIMITIVES => "Unequal primitive values",
}.freeze

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(use_color: true, mode: :by_object, context_lines: 3, diff_grouping_lines: nil, visualization_map: nil, character_map_file: nil, character_definitions: nil, show_diffs: :all, verbose_diff: false, show_raw_inputs: false, show_raw_expected: false, show_raw_received: false, show_preprocessed_inputs: false, show_preprocessed_expected: false, show_preprocessed_received: false, show_prettyprint_inputs: false, show_prettyprint_expected: false, show_prettyprint_received: false, show_line_numbered_inputs: false, character_visualization: true, display_preprocessing: :none, pretty_printer_indent: 2, pretty_printer_indent_type: :space, preserve_whitespace_elements: [], collapse_whitespace_elements: [], strip_whitespace_elements: [], pretty_printed_expected: false, pretty_printed_received: false, pretty_printer_sort_attributes: false, compact_semantic_report: false, expand_difference: false, diff_mode: :separate, legacy_terminal: false) ⇒ DiffFormatter

rubocop:disable Metrics/ParameterLists



166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# File 'lib/canon/diff_formatter.rb', line 166

def initialize(use_color: true, mode: :by_object, context_lines: 3,
               diff_grouping_lines: nil, visualization_map: nil,
               character_map_file: nil, character_definitions: nil,
               show_diffs: :all, verbose_diff: false,
               show_raw_inputs: false, show_raw_expected: false,
               show_raw_received: false,
               show_preprocessed_inputs: false,
               show_preprocessed_expected: false,
               show_preprocessed_received: false,
               show_prettyprint_inputs: false,
               show_prettyprint_expected: false,
               show_prettyprint_received: false,
               show_line_numbered_inputs: false,
               character_visualization: true,
               display_preprocessing: :none,
               pretty_printer_indent: 2,
               pretty_printer_indent_type: :space,
               preserve_whitespace_elements: [],
               collapse_whitespace_elements: [],
               strip_whitespace_elements: [],
               pretty_printed_expected: false,
               pretty_printed_received: false,
               pretty_printer_sort_attributes: false,
               compact_semantic_report: false,
               expand_difference: false,
               diff_mode: :separate, legacy_terminal: false)
  # rubocop:enable Metrics/ParameterLists
  @use_color = use_color
  @mode = mode
  @context_lines = context_lines
  @diff_grouping_lines = diff_grouping_lines
  @show_diffs = show_diffs
  @verbose_diff = verbose_diff
  @show_raw_inputs = show_raw_inputs
  @show_raw_expected = show_raw_expected
  @show_raw_received = show_raw_received
  @show_preprocessed_inputs = show_preprocessed_inputs
  @show_preprocessed_expected = show_preprocessed_expected
  @show_preprocessed_received = show_preprocessed_received
  @show_prettyprint_inputs = show_prettyprint_inputs
  @show_prettyprint_expected = show_prettyprint_expected
  @show_prettyprint_received = show_prettyprint_received
  @show_line_numbered_inputs = show_line_numbered_inputs
  @character_visualization = character_visualization
  @display_preprocessing = display_preprocessing
  @pretty_printer_indent = pretty_printer_indent
  @pretty_printer_indent_type = pretty_printer_indent_type
  @preserve_whitespace_elements = Array(preserve_whitespace_elements).map(&:to_s)
  @collapse_whitespace_elements = Array(collapse_whitespace_elements).map(&:to_s)
  @strip_whitespace_elements = Array(strip_whitespace_elements).map(&:to_s)
  @pretty_printed_expected = pretty_printed_expected
  @pretty_printed_received = pretty_printed_received
  @pretty_printer_sort_attributes = pretty_printer_sort_attributes
  @compact_semantic_report = compact_semantic_report
  @expand_difference = expand_difference
  @diff_mode = legacy_terminal ? :separate : diff_mode
  @legacy_terminal = legacy_terminal
  @visualization_map = build_visualization_map(
    character_visualization: character_visualization,
    visualization_map: visualization_map,
    character_map_file: character_map_file,
    character_definitions: character_definitions,
  )
end

Class Method Details

.build_character_definition(definition) ⇒ Hash

Build character definition from hash

Parameters:

  • definition (Hash)

    Character definition with keys (matching YAML format):

    • :character or :unicode (required)

    • :visualization (required)

    • :category (required)

    • :name (required)

Returns:

  • (Hash)

    Single-entry visualization map



269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
# File 'lib/canon/diff_formatter.rb', line 269

def self.build_character_definition(definition)
  # Validate required fields
  char = if definition[:unicode]
           [definition[:unicode].to_i(16)].pack("U")
         elsif definition[:character]
           definition[:character]
         else
           raise ArgumentError,
                 "Character definition must include :character or :unicode"
         end

  unless definition[:visualization]
    raise ArgumentError, "Character definition must include :visualization"
  end

  unless definition[:category]
    raise ArgumentError, "Character definition must include :category"
  end

  unless definition[:name]
    raise ArgumentError, "Character definition must include :name"
  end

  { char => definition[:visualization] }
end

.character_map_dataObject

Lazily load and cache character map data



130
131
132
# File 'lib/canon/diff_formatter.rb', line 130

def self.character_map_data
  @character_map_data ||= load_character_map
end

.load_character_mapHash

Load character map from YAML file

Returns:

  • (Hash)

    Hash with :visualization_map, :category_map, :category_names



85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# File 'lib/canon/diff_formatter.rb', line 85

def self.load_character_map
  yaml_path = File.join(__dir__, "diff_formatter", "character_map.yml")
  data = YAML.load_file(yaml_path)

  visualization_map = {}
  category_map = {}
   = {}

  data["characters"].each do |char_data|
    # Get character from either unicode code point or character field
    char = if char_data["unicode"]
             # Convert hex string to character
             [char_data["unicode"].to_i(16)].pack("U")
           else
             # Use character field directly (handles \n, \r, \t, etc.)
             char_data["character"]
           end

    vis = char_data["visualization"]
    category = char_data["category"].to_sym
    name = char_data["name"]

    visualization_map[char] = vis
    category_map[char] = category
    [char] = {
      visualization: vis,
      category: category,
      name: name,
    }
  end

  category_names = {}
  data["category_names"].each do |key, value|
    category_names[key.to_sym] = value
  end

  {
    visualization_map: visualization_map,
    category_map: category_map,
    category_names: category_names,
    character_metadata: ,
  }
end

.load_custom_character_map(file_path) ⇒ Hash

Load character map from custom YAML file

Parameters:

  • file_path (String)

    Path to YAML file with character definitions

Returns:

  • (Hash)

    Character visualization map



243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
# File 'lib/canon/diff_formatter.rb', line 243

def self.load_custom_character_map(file_path)
  data = YAML.load_file(file_path)
  visualization_map = {}

  data["characters"].each do |char_data|
    # Get character from either unicode code point or character field
    char = if char_data["unicode"]
             [char_data["unicode"].to_i(16)].pack("U")
           else
             char_data["character"]
           end

    visualization_map[char] = char_data["visualization"]
  end

  visualization_map
end

.merge_visualization_map(custom_map) ⇒ Hash

Merge custom character visualization map with defaults

Parameters:

  • custom_map (Hash, nil)

    Custom character mappings

Returns:

  • (Hash)

    Merged character visualization map



235
236
237
# File 'lib/canon/diff_formatter.rb', line 235

def self.merge_visualization_map(custom_map)
  DEFAULT_VISUALIZATION_MAP.merge(custom_map || {})
end

Instance Method Details

#format(differences, format, doc1: nil, doc2: nil, html_version: nil) ⇒ String

Format differences array for display

Parameters:

  • differences (Array)

    Array of difference hashes

  • format (Symbol)

    Format type (:xml, :html, :json, :yaml)

  • doc1 (String, nil) (defaults to: nil)

    First document content (for by-line mode)

  • doc2 (String, nil) (defaults to: nil)

    Second document content (for by-line mode)

  • html_version (Symbol, nil) (defaults to: nil)

    HTML version (:html4 or :html5)

Returns:

  • (String)

    Formatted output



303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
# File 'lib/canon/diff_formatter.rb', line 303

def format(differences, format, doc1: nil, doc2: nil, html_version: nil)
  # In by-line mode, always use by-line diff
  if @mode == :by_line && doc1 && doc2
    return (doc1, doc2, format: format,
                                    html_version: html_version,
                                    differences: differences)
  end

  # In pretty_diff mode, always use text-LCS diff (bypasses DiffNodeMapper).
  # pretty_diff_format handles nil doc1/doc2 itself (emits header only).
  if @mode == :pretty_diff
    return pretty_diff_format(doc1, doc2, format: format)
  end

  no_diffs = if differences.respond_to?(:equivalent?)
               differences.equivalent?
             else
               differences.empty?
             end
  return success_message if no_diffs

  case @mode
  when :by_line
    (doc1, doc2, format: format, html_version: html_version,
                             differences: differences)
  when :pretty_diff
    pretty_diff_format(doc1, doc2, format: format)
  else
    by_object_diff(differences, format)
  end
end

#format_comparison_result(comparison_result, expected, actual) ⇒ String

Format comparison result from Canon::Comparison.equivalent? This is the single entry point for generating diffs from comparison results

Parameters:

  • comparison_result (ComparisonResult, Hash, Array, Boolean)

    Result from Canon::Comparison.equivalent?

  • expected (Object)

    Expected value

  • actual (Object)

    Actual value

Returns:

  • (String)

    Formatted diff output



342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
# File 'lib/canon/diff_formatter.rb', line 342

def format_comparison_result(comparison_result, expected, actual)
  # Detect format from expected content
  format = Canon::Comparison::FormatDetector.detect(expected)

  formatter_options = {
    use_color: @use_color,
    mode: @mode,
    context_lines: @context_lines,
    diff_grouping_lines: @diff_grouping_lines,
    show_diffs: @show_diffs,
    verbose_diff: @verbose_diff,
  }

  output = []

  # Display the algorithm being used
  if comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    algorithm_name = case comparison_result.algorithm
                     when :semantic
                       "SEMANTIC TREE DIFF"
                     else
                       "DOM DIFF"
                     end
    output << colorize("Algorithm: #{algorithm_name}", :cyan, :bold)
    output << "" # Blank line for spacing
  end

  # 1. CANON VERBOSE tables (ONLY if CANON_VERBOSE=1)
  verbose_tables = DebugOutput.verbose_tables_only(
    comparison_result,
    formatter_options,
  )
  output << verbose_tables unless verbose_tables.empty?

  # 2. Semantic Diff Report (ALWAYS if diffs exist)
  if comparison_result.is_a?(Canon::Comparison::ComparisonResult) &&
      comparison_result.differences.any?
    require_relative "diff_formatter/diff_detail_formatter"
    output << DiffDetailFormatter.format_report(
      comparison_result.differences,
      use_color: @use_color,
      show_diffs: @show_diffs,
      compact_semantic_report: @compact_semantic_report,
      expand_difference: @expand_difference,
    )
  end

  # verbose_diff / show_raw_inputs shows both sides as a convenience shorthand.
  # show_raw_expected / show_raw_received give per-side control.
  combined_raw = @verbose_diff || @show_raw_inputs
  show_raw_exp = combined_raw || @show_raw_expected
  show_raw_rec = combined_raw || @show_raw_received
  verbose      = show_raw_exp || show_raw_rec
  # verbose_diff / show_preprocessed_inputs shows both sides as a shorthand.
  # show_preprocessed_expected / show_preprocessed_received give per-side control.
  combined_prep = @verbose_diff || @show_preprocessed_inputs
  show_prep_exp = combined_prep || @show_preprocessed_expected
  show_prep_rec = combined_prep || @show_preprocessed_received
  show_prep = show_prep_exp || show_prep_rec
  show_line = @verbose_diff || @show_line_numbered_inputs

  # 3. Raw/Original Input Display (when show_raw_inputs/show_raw_expected/show_raw_received enabled)
  if verbose && comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    original1, original2 = comparison_result.original_strings
    if original1 && original2
      output << format_raw_inputs(original1, original2,
                                  show_expected: show_raw_exp,
                                  show_received: show_raw_rec)
    end
  end

  # 4. Preprocessed Input Display (when show_preprocessed_inputs/expected/received enabled)
  if show_prep && comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    preprocessed1, preprocessed2 = comparison_result.preprocessed_strings
    if preprocessed1 && preprocessed2
      preprocessing_info = comparison_result.match_options&.dig(:match,
                                                                :preprocessing)
      output << format_preprocessed_inputs(preprocessed1, preprocessed2,
                                           preprocessing_info,
                                           show_expected: show_prep_exp,
                                           show_received: show_prep_rec)
    end
  end

  # 4.5. Pretty-printed Input Display (when show_prettyprint_inputs/expected/received enabled)
  # Pretty-prints the ORIGINAL strings (not preprocessed) through PrettyPrinter::Xml/Html
  # with NO character visualization — output is plain ASCII suitable for copy-pasting
  # into RSpec fixture heredocs.  verbose_diff does NOT enable these options.
  show_pp_inp = @show_prettyprint_inputs
  show_pp_exp = show_pp_inp || @show_prettyprint_expected
  show_pp_rec = show_pp_inp || @show_prettyprint_received
  show_pp = show_pp_exp || show_pp_rec

  if show_pp && comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    orig1, orig2 = comparison_result.original_strings
    if orig1 && orig2
      pp1, pp2 = prettyprint_for_display(orig1, orig2, format)
      output << format_prettyprint_inputs(pp1, pp2,
                                          show_expected: show_pp_exp,
                                          show_received: show_pp_rec)
    end
  end

  # 5. Line-Numbered Input Display (when show_line_numbered_inputs is enabled)
  if show_line && comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    original1, original2 = comparison_result.original_strings
    if original1 && original2
      output << format_line_numbered_inputs(original1, original2)
    end
  end

  # 6. Main diff output (by-line or by-object) - ALWAYS

  # Check if comparison result is a ComparisonResult object
  if comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    # Use original strings for line diff to show actual formatting/namespace differences
    # Use preprocessed strings for semantic comparison only
    doc1, doc2 = comparison_result.original_strings
    differences = comparison_result.differences
    html_version = comparison_result.html_version
  elsif comparison_result.is_a?(Hash) && comparison_result[:preprocessed]
    # Legacy Hash format - Use preprocessed strings from comparison
    doc1, doc2 = comparison_result[:preprocessed]
    differences = comparison_result[:differences]
    html_version = comparison_result[:html_version]
  else
    # Legacy path: normalize content for display
    doc1, doc2 = normalize_content_for_display(expected, actual, format)
    # comparison_result is an array of differences when verbose: true
    differences = comparison_result.is_a?(Array) ? comparison_result : []
    html_version = nil
  end

  # Generate diff using existing format method
  output << format(differences, format, doc1: doc1, doc2: doc2,
                                        html_version: html_version)

  output.compact.join("\n")
end