Gem Version License Build Status

Uniword is a Ruby library for reading and writing Microsoft Word documents in DOCX (Word 2007+) and MHTML (Word 2003+) formats, with 100% OOXML specification coverage (760 elements, 22 namespaces) and perfect round-trip fidelity.

Installation

Add to your Gemfile:

gem 'uniword'

Or install directly:

gem install uniword

Quick Start

require 'uniword'

# Create a document
doc = Uniword::Builder.new
  .add_heading('My Document', level: 1)
  .add_paragraph('Hello World', bold: true)
  .build
doc.save('output.docx')

# Read and modify
doc = Uniword::DocumentFactory.from_file('input.docx')
puts doc.text
doc.save('modified.docx')

# Apply theme and styleset
doc = Uniword::Wordprocessingml::DocumentRoot.new
doc.apply_theme('meridian')
doc.apply_styleset('signature')
doc.save('styled.docx')

# Compare two documents (content-level)
old_doc = Uniword.load("v1.docx")
new_doc = Uniword.load("v2.docx")
result = old_doc.diff(new_doc)
puts result.summary

# Compare two DOCX packages (structural, with Canon semantic diff)
differ = Uniword::Diff::PackageDiffer.new("old.docx", "new.docx", canon: true)
pkg_result = differ.diff
pkg_result.modified_parts.each do |part|
  status = part.canon_equivalent ? "equivalent" : "DIFFERENT"
  puts "#{part.name}: canon #{status}"
end

CLI

uniword convert input.docx output.doc        # DOCX to MHTML
uniword info document.docx                   # Document info
uniword validate document.docx               # Schema validation
uniword verify document.docx                 # Full 3-layer verification
uniword diff compare old.docx new.docx       # Document-level diff (content, styles)
uniword diff package old.docx new.docx --canon  # Package-level diff (ZIP, XML, OPC)
uniword theme apply doc.docx out.docx -n meridian  # Apply theme
uniword theme auto ms.docx uniword.docx      # Auto MS->Uniword transition

Features

  • Full DOCX and MHTML read/write with format conversion

  • 29 bundled themes, 12 stylesets, 23 color schemes, 25 font schemes

  • 30-locale document elements (240 templates)

  • MS font to open-source substitution (Calibri→Carlito, Arial→Liberation Sans)

  • Auto theme transition: detect MS themes by color fingerprint

  • DOCX diffing: document-level (content/formatting) and package-level (ZIP/XML/OPC with Canon semantic comparison)

  • DOCX validation: 3-layer pipeline (OPC + XSD + semantic rules)

  • Prevention layer: auto-fix footnote/endnote cross-part invariants

  • Tables, lists, images, headers/footers, footnotes, bookmarks, math

  • Fluent Builder API and CLI interface

Documentation

Full documentation is at metanorma.github.io/uniword.

Contributing

See CONTRIBUTING.md for development guidelines. Bug reports and pull requests are welcome at https://github.com/metanorma/uniword.

License

The gem is available as open source under the terms of the BSD 2-Clause License.

Copyright (c) 2024 Ribose Inc.