Uniword is a Ruby library for reading and writing Microsoft Word documents in DOCX (Word 2007+) and MHTML (Word 2003+) formats, with 100% OOXML specification coverage (760 elements, 22 namespaces) and perfect round-trip fidelity.
-
Documentation: Documentation Site
-
API Reference: RubyDoc.info
Installation
Add to your Gemfile:
gem 'uniword'
Or install directly:
gem install uniword
Quick Start
require 'uniword'
# Create a document
doc = Uniword::Builder.new
.add_heading('My Document', level: 1)
.add_paragraph('Hello World', bold: true)
.build
doc.save('output.docx')
# Read and modify
doc = Uniword::DocumentFactory.from_file('input.docx')
puts doc.text
doc.save('modified.docx')
# Apply theme and styleset
doc = Uniword::Wordprocessingml::DocumentRoot.new
doc.apply_theme('meridian')
doc.apply_styleset('signature')
doc.save('styled.docx')
# Compare two documents (content-level)
old_doc = Uniword.load("v1.docx")
new_doc = Uniword.load("v2.docx")
result = old_doc.diff(new_doc)
puts result.summary
# Compare two DOCX packages (structural, with Canon semantic diff)
differ = Uniword::Diff::PackageDiffer.new("old.docx", "new.docx", canon: true)
pkg_result = differ.diff
pkg_result.modified_parts.each do |part|
status = part.canon_equivalent ? "equivalent" : "DIFFERENT"
puts "#{part.name}: canon #{status}"
end
CLI
uniword convert input.docx output.doc # DOCX to MHTML
uniword info document.docx # Document info
uniword validate document.docx # Schema validation
uniword verify document.docx # Full 3-layer verification
uniword diff compare old.docx new.docx # Document-level diff (content, styles)
uniword diff package old.docx new.docx --canon # Package-level diff (ZIP, XML, OPC)
uniword theme apply doc.docx out.docx -n meridian # Apply theme
uniword theme auto ms.docx uniword.docx # Auto MS->Uniword transition
Features
-
Full DOCX and MHTML read/write with format conversion
-
29 bundled themes, 12 stylesets, 23 color schemes, 25 font schemes
-
30-locale document elements (240 templates)
-
MS font to open-source substitution (Calibri→Carlito, Arial→Liberation Sans)
-
Auto theme transition: detect MS themes by color fingerprint
-
DOCX diffing: document-level (content/formatting) and package-level (ZIP/XML/OPC with Canon semantic comparison)
-
DOCX validation: 3-layer pipeline (OPC + XSD + semantic rules)
-
Prevention layer: auto-fix footnote/endnote cross-part invariants
-
Tables, lists, images, headers/footers, footnotes, bookmarks, math
-
Fluent Builder API and CLI interface
Documentation
Full documentation is at metanorma.github.io/uniword.
-
Getting Started — Installation and first document
-
Interfaces — Ruby API and CLI reference
-
Guides — Step-by-step guides for every feature
-
Understanding — Architecture and design
-
Features — Theme and styleset catalogs
-
Verification — DOCX validation and verification
-
Development — Contributing and code patterns
Contributing
See CONTRIBUTING.md for development guidelines. Bug reports and pull requests are welcome at https://github.com/metanorma/uniword.
License
The gem is available as open source under the terms of the BSD 2-Clause License.
Copyright (c) 2024 Ribose Inc.