Class: Canon::DiffFormatter

Inherits:
Object
  • Object
show all
Defined in:
lib/canon/diff_formatter.rb,
lib/canon/diff_formatter/legend.rb,
lib/canon/diff_formatter/debug_output.rb,
lib/canon/diff_formatter/by_line/xml_formatter.rb,
lib/canon/diff_formatter/diff_detail_formatter.rb,
lib/canon/diff_formatter/by_line/base_formatter.rb,
lib/canon/diff_formatter/by_line/html_formatter.rb,
lib/canon/diff_formatter/by_line/json_formatter.rb,
lib/canon/diff_formatter/by_line/yaml_formatter.rb,
lib/canon/diff_formatter/by_object/xml_formatter.rb,
lib/canon/diff_formatter/by_line/simple_formatter.rb,
lib/canon/diff_formatter/by_object/base_formatter.rb,
lib/canon/diff_formatter/by_object/json_formatter.rb,
lib/canon/diff_formatter/by_object/yaml_formatter.rb,
lib/canon/diff_formatter/diff_detail_formatter/node_utils.rb,
lib/canon/diff_formatter/diff_detail_formatter/text_utils.rb,
lib/canon/diff_formatter/diff_detail_formatter/color_helper.rb,
lib/canon/diff_formatter/diff_detail_formatter/location_extractor.rb,
lib/canon/diff_formatter/diff_detail_formatter/dimension_formatter.rb

Overview

Formatter for displaying semantic differences with color support

This is a pure orchestrator class that delegates formatting to mode-specific and format-specific formatters. It provides a unified interface for generating both by-line and by-object diffs across multiple formats (XML, HTML, JSON, YAML).

Architecture

DiffFormatter follows the orchestrator pattern with MECE (Mutually Exclusive, Collectively Exhaustive) delegation:

  1. **Mode Selection**: Chooses by-line or by-object visualization

  2. **Format Delegation**: Dispatches to format-specific formatter

  3. Customization: Applies color, context, and visualization options

Diff Modes

**By-Object Mode** (default for XML/JSON/YAML):

  • Tree-based semantic diff

  • Shows only what changed in the structure

  • Visual tree with box-drawing characters

  • Best for configuration files and structured data

**By-Line Mode** (default for HTML):

  • Traditional line-by-line diff

  • Shows changes in document order with context

  • Syntax-aware token highlighting

  • Best for markup and when line context matters

Visualization Features

  • **Color support**: Red (deletions), green (additions), yellow (structure), cyan (informative)

  • **Whitespace visualization**: Makes invisible characters visible

  • **Context lines**: Shows unchanged lines around changes

  • **Diff grouping**: Groups nearby changes into blocks

  • **Character map customization**: CJK-safe Unicode symbols

Usage

# Basic usage
formatter = Canon::DiffFormatter.new(use_color: true, mode: :by_object)
output = formatter.format(differences, :xml, doc1: xml1, doc2: xml2)

# With options
formatter = Canon::DiffFormatter.new(
  use_color: true,
  mode: :by_line,
  context_lines: 5,
  diff_grouping_lines: 10,
  show_diffs: :normative
)

Defined Under Namespace

Modules: ByLine, ByObject, DebugOutput, DiffDetailFormatter, DiffDetailFormatterHelpers, Legend

Constant Summary collapse

DEFAULT_VISUALIZATION_MAP =

Default character visualization map (loaded from YAML)

character_map_data[:visualization_map].freeze
CHARACTER_CATEGORY_MAP =

Character category map (loaded from YAML)

character_map_data[:category_map].freeze
CHARACTER_CATEGORY_NAMES =

Category display names (loaded from YAML)

character_map_data[:category_names].freeze
CHARACTER_METADATA =

Character metadata including names (loaded from YAML)

character_map_data[:character_metadata].freeze
DIFF_DESCRIPTIONS =

Map difference codes to human-readable descriptions

{
  Comparison::EQUIVALENT => "Equivalent",
  Comparison::MISSING_ATTRIBUTE => "Missing attribute",
  Comparison::MISSING_NODE => "Missing node",
  Comparison::UNEQUAL_ATTRIBUTES => "Unequal attributes",
  Comparison::UNEQUAL_COMMENTS => "Unequal comments",
  Comparison::UNEQUAL_DOCUMENTS => "Unequal documents",
  Comparison::UNEQUAL_ELEMENTS => "Unequal elements",
  Comparison::UNEQUAL_NODES_TYPES => "Unequal node types",
  Comparison::UNEQUAL_TEXT_CONTENTS => "Unequal text contents",
  Comparison::MISSING_HASH_KEY => "Missing hash key",
  Comparison::UNEQUAL_HASH_VALUES => "Unequal hash values",
  Comparison::UNEQUAL_ARRAY_LENGTHS => "Unequal array lengths",
  Comparison::UNEQUAL_ARRAY_ELEMENTS => "Unequal array elements",
  Comparison::UNEQUAL_TYPES => "Unequal types",
  Comparison::UNEQUAL_PRIMITIVES => "Unequal primitive values",
}.freeze

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(use_color: true, mode: :by_object, context_lines: 3, diff_grouping_lines: nil, visualization_map: nil, character_map_file: nil, character_definitions: nil, show_diffs: :all, verbose_diff: false, show_raw_inputs: false, show_preprocessed_inputs: false, show_line_numbered_inputs: false) ⇒ DiffFormatter

rubocop:disable Metrics/ParameterLists



166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
# File 'lib/canon/diff_formatter.rb', line 166

def initialize(use_color: true, mode: :by_object, context_lines: 3,
               diff_grouping_lines: nil, visualization_map: nil,
               character_map_file: nil, character_definitions: nil,
               show_diffs: :all, verbose_diff: false,
               show_raw_inputs: false, show_preprocessed_inputs: false,
               show_line_numbered_inputs: false)
  # rubocop:enable Metrics/ParameterLists
  @use_color = use_color
  @mode = mode
  @context_lines = context_lines
  @diff_grouping_lines = diff_grouping_lines
  @show_diffs = show_diffs
  @verbose_diff = verbose_diff
  @show_raw_inputs = show_raw_inputs
  @show_preprocessed_inputs = show_preprocessed_inputs
  @show_line_numbered_inputs = show_line_numbered_inputs
  @visualization_map = build_visualization_map(
    visualization_map: visualization_map,
    character_map_file: character_map_file,
    character_definitions: character_definitions,
  )
end

Class Method Details

.build_character_definition(definition) ⇒ Hash

Build character definition from hash

Parameters:

  • definition (Hash)

    Character definition with keys (matching YAML format):

    • :character or :unicode (required)

    • :visualization (required)

    • :category (required)

    • :name (required)

Returns:

  • (Hash)

    Single-entry visualization map



227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# File 'lib/canon/diff_formatter.rb', line 227

def self.build_character_definition(definition)
  # Validate required fields
  char = if definition[:unicode]
           [definition[:unicode].to_i(16)].pack("U")
         elsif definition[:character]
           definition[:character]
         else
           raise ArgumentError,
                 "Character definition must include :character or :unicode"
         end

  unless definition[:visualization]
    raise ArgumentError, "Character definition must include :visualization"
  end

  unless definition[:category]
    raise ArgumentError, "Character definition must include :category"
  end

  unless definition[:name]
    raise ArgumentError, "Character definition must include :name"
  end

  { char => definition[:visualization] }
end

.character_map_dataObject

Lazily load and cache character map data



130
131
132
# File 'lib/canon/diff_formatter.rb', line 130

def self.character_map_data
  @character_map_data ||= load_character_map
end

.load_character_mapHash

Load character map from YAML file

Returns:

  • (Hash)

    Hash with :visualization_map, :category_map, :category_names



85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# File 'lib/canon/diff_formatter.rb', line 85

def self.load_character_map
  yaml_path = File.join(__dir__, "diff_formatter", "character_map.yml")
  data = YAML.load_file(yaml_path)

  visualization_map = {}
  category_map = {}
   = {}

  data["characters"].each do |char_data|
    # Get character from either unicode code point or character field
    char = if char_data["unicode"]
             # Convert hex string to character
             [char_data["unicode"].to_i(16)].pack("U")
           else
             # Use character field directly (handles \n, \r, \t, etc.)
             char_data["character"]
           end

    vis = char_data["visualization"]
    category = char_data["category"].to_sym
    name = char_data["name"]

    visualization_map[char] = vis
    category_map[char] = category
    [char] = {
      visualization: vis,
      category: category,
      name: name,
    }
  end

  category_names = {}
  data["category_names"].each do |key, value|
    category_names[key.to_sym] = value
  end

  {
    visualization_map: visualization_map,
    category_map: category_map,
    category_names: category_names,
    character_metadata: ,
  }
end

.load_custom_character_map(file_path) ⇒ Hash

Load character map from custom YAML file

Parameters:

  • file_path (String)

    Path to YAML file with character definitions

Returns:

  • (Hash)

    Character visualization map



201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
# File 'lib/canon/diff_formatter.rb', line 201

def self.load_custom_character_map(file_path)
  data = YAML.load_file(file_path)
  visualization_map = {}

  data["characters"].each do |char_data|
    # Get character from either unicode code point or character field
    char = if char_data["unicode"]
             [char_data["unicode"].to_i(16)].pack("U")
           else
             char_data["character"]
           end

    visualization_map[char] = char_data["visualization"]
  end

  visualization_map
end

.merge_visualization_map(custom_map) ⇒ Hash

Merge custom character visualization map with defaults

Parameters:

  • custom_map (Hash, nil)

    Custom character mappings

Returns:

  • (Hash)

    Merged character visualization map



193
194
195
# File 'lib/canon/diff_formatter.rb', line 193

def self.merge_visualization_map(custom_map)
  DEFAULT_VISUALIZATION_MAP.merge(custom_map || {})
end

Instance Method Details

#format(differences, format, doc1: nil, doc2: nil, html_version: nil) ⇒ String

Format differences array for display

Parameters:

  • differences (Array)

    Array of difference hashes

  • format (Symbol)

    Format type (:xml, :html, :json, :yaml)

  • doc1 (String, nil) (defaults to: nil)

    First document content (for by-line mode)

  • doc2 (String, nil) (defaults to: nil)

    Second document content (for by-line mode)

  • html_version (Symbol, nil) (defaults to: nil)

    HTML version (:html4 or :html5)

Returns:

  • (String)

    Formatted output



261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
# File 'lib/canon/diff_formatter.rb', line 261

def format(differences, format, doc1: nil, doc2: nil, html_version: nil)
  # In by-line mode with doc1/doc2, always perform diff regardless of differences
  if @mode == :by_line && doc1 && doc2
    return (doc1, doc2, format: format,
                                    html_version: html_version,
                                    differences: differences)
  end

  # Check if no differences (handle both ComparisonResult and legacy Array)
  no_diffs = if differences.respond_to?(:equivalent?)
               # ComparisonResult object (production path)
               differences.equivalent?
             else
               # Legacy Array (for low-level tests)
               differences.empty?
             end
  return success_message if no_diffs

  case @mode
  when :by_line
    (doc1, doc2, format: format, html_version: html_version,
                             differences: differences)
  else
    by_object_diff(differences, format)
  end
end

#format_comparison_result(comparison_result, expected, actual) ⇒ String

Format comparison result from Canon::Comparison.equivalent? This is the single entry point for generating diffs from comparison results

Parameters:

  • comparison_result (ComparisonResult, Hash, Array, Boolean)

    Result from Canon::Comparison.equivalent?

  • expected (Object)

    Expected value

  • actual (Object)

    Actual value

Returns:

  • (String)

    Formatted diff output



295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
# File 'lib/canon/diff_formatter.rb', line 295

def format_comparison_result(comparison_result, expected, actual)
  # Detect format from expected content
  format = Canon::Comparison::FormatDetector.detect(expected)

  formatter_options = {
    use_color: @use_color,
    mode: @mode,
    context_lines: @context_lines,
    diff_grouping_lines: @diff_grouping_lines,
    show_diffs: @show_diffs,
    verbose_diff: @verbose_diff,
  }

  output = []

  # Display the algorithm being used
  if comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    algorithm_name = case comparison_result.algorithm
                     when :semantic
                       "SEMANTIC TREE DIFF"
                     else
                       "DOM DIFF"
                     end
    output << colorize("Algorithm: #{algorithm_name}", :cyan, :bold)
    output << "" # Blank line for spacing
  end

  # 1. CANON VERBOSE tables (ONLY if CANON_VERBOSE=1)
  verbose_tables = DebugOutput.verbose_tables_only(
    comparison_result,
    formatter_options,
  )
  output << verbose_tables unless verbose_tables.empty?

  # 2. Semantic Diff Report (ALWAYS if diffs exist)
  if comparison_result.is_a?(Canon::Comparison::ComparisonResult) &&
      comparison_result.differences.any?
    require_relative "diff_formatter/diff_detail_formatter"
    output << DiffDetailFormatter.format_report(
      comparison_result.differences,
      use_color: @use_color,
    )
  end

  # verbose_diff enables all three input displays as a convenience
  verbose = @verbose_diff || @show_raw_inputs
  show_prep = @verbose_diff || @show_preprocessed_inputs
  show_line = @verbose_diff || @show_line_numbered_inputs

  # 3. Raw/Original Input Display (when show_raw_inputs is enabled)
  if verbose && comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    original1, original2 = comparison_result.original_strings
    if original1 && original2
      output << format_raw_inputs(original1, original2)
    end
  end

  # 4. Preprocessed Input Display (when show_preprocessed_inputs is enabled)
  if show_prep && comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    preprocessed1, preprocessed2 = comparison_result.preprocessed_strings
    if preprocessed1 && preprocessed2
      preprocessing_info = comparison_result.match_options&.dig(:match,
                                                                :preprocessing)
      output << format_preprocessed_inputs(preprocessed1, preprocessed2,
                                           preprocessing_info)
    end
  end

  # 5. Line-Numbered Input Display (when show_line_numbered_inputs is enabled)
  if show_line && comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    original1, original2 = comparison_result.original_strings
    if original1 && original2
      output << format_line_numbered_inputs(original1, original2)
    end
  end

  # 6. Main diff output (by-line or by-object) - ALWAYS

  # Check if comparison result is a ComparisonResult object
  if comparison_result.is_a?(Canon::Comparison::ComparisonResult)
    # Use original strings for line diff to show actual formatting/namespace differences
    # Use preprocessed strings for semantic comparison only
    doc1, doc2 = comparison_result.original_strings
    differences = comparison_result.differences
    html_version = comparison_result.html_version
  elsif comparison_result.is_a?(Hash) && comparison_result[:preprocessed]
    # Legacy Hash format - Use preprocessed strings from comparison
    doc1, doc2 = comparison_result[:preprocessed]
    differences = comparison_result[:differences]
    html_version = comparison_result[:html_version]
  else
    # Legacy path: normalize content for display
    doc1, doc2 = normalize_content_for_display(expected, actual, format)
    # comparison_result is an array of differences when verbose: true
    differences = comparison_result.is_a?(Array) ? comparison_result : []
    html_version = nil
  end

  # Generate diff using existing format method
  output << format(differences, format, doc1: doc1, doc2: doc2,
                                        html_version: html_version)

  output.compact.join("\n")
end