Class: Uniword::Diff::PackageDiffer

Inherits:
Object
  • Object
show all
Defined in:
lib/uniword/diff/package_differ.rb

Overview

Compares two DOCX files at the ZIP/XML/OPC structural level.

Detects differences in:

  • ZIP entries (added/removed parts)

  • ZIP entry metadata (compression, text/binary flag, timestamps)

  • XML content (semantic equivalence via Canon, element structure)

  • OPC validation (content types, relationships, required parts)

Unlike DocumentDiffer (which compares loaded DocumentRoot models), PackageDiffer works on raw DOCX ZIP contents, detecting what Word or other applications changed during repair.

Examples:

Basic comparison

differ = PackageDiffer.new("bad.docx", "repaired.docx")
result = differ.diff
puts result.summary

With Canon semantic comparison

result = PackageDiffer.new("bad.docx", "repaired.docx",
  canon: true).diff
result.modified_parts.each do |p|
  puts "#{p.name}: canon_equivalent=#{p.canon_equivalent}"
end

Constant Summary collapse

REQUIRED_PARTS =

Required parts for a valid OOXML DOCX package.

%w[
  [Content_Types].xml
  _rels/.rels
  word/document.xml
].freeze
STANDARD_CONTENT_TYPES =

Standard DOCX parts and their expected content types.

{
  "word/document.xml" => "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",
  "word/styles.xml" => "application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml",
  "word/settings.xml" => "application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml",
  "word/fontTable.xml" => "application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml",
  "word/webSettings.xml" => "application/vnd.openxmlformats-officedocument.wordprocessingml.webSettings+xml",
  "word/theme/theme1.xml" => "application/vnd.openxmlformats-officedocument.theme+xml",
  "docProps/core.xml" => "application/vnd.openxmlformats-package.core-properties+xml",
  "docProps/app.xml" => "application/vnd.openxmlformats-officedocument.extended-properties+xml",
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(old_path, new_path, canon: false, canon_profile: :spec_friendly) ⇒ PackageDiffer

Initialize with two DOCX file paths.

Parameters:

  • old_path (String)

    Path to original DOCX

  • new_path (String)

    Path to modified/repaired DOCX

  • canon (Boolean) (defaults to: false)

    Whether to use Canon for semantic XML comparison

  • canon_profile (Symbol) (defaults to: :spec_friendly)

    Canon match profile to use



59
60
61
62
63
64
65
# File 'lib/uniword/diff/package_differ.rb', line 59

def initialize(old_path, new_path, canon: false,
canon_profile: :spec_friendly)
  @old_path = old_path
  @new_path = new_path
  @canon = canon
  @canon_profile = canon_profile
end

Instance Method Details

#diffPackageDiffResult

Perform structural diff and return a PackageDiffResult.

Returns:



70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/uniword/diff/package_differ.rb', line 70

def diff
  old_zip = Zip::File.open(@old_path)
  new_zip = Zip::File.open(@new_path)

  begin
    part_diff = diff_parts(old_zip, new_zip)
    content_diff = diff_xml_content(old_zip, new_zip, part_diff)
     = (old_zip, new_zip)
    opc = validate_opc(old_zip, new_zip)
  ensure
    old_zip.close
    new_zip.close
  end

  PackageDiffResult.new(
    old_path: @old_path,
    new_path: @new_path,
    added_parts: part_diff[:added],
    removed_parts: part_diff[:removed],
    modified_parts: part_diff[:modified],
    unchanged_parts: part_diff[:unchanged],
    xml_changes: content_diff,
    zip_metadata_changes: ,
    opc_issues: opc,
  )
end