Class: Uniword::DocumentFactory

Inherits:
Object
  • Object
show all
Defined in:
lib/uniword/document_factory.rb

Overview

Factory for creating Document instances.

Responsibility: Handle document creation from various sources. Follows Single Responsibility Principle - creation logic separated from Document class itself.

Examples:

Create document from file

document = Uniword::DocumentFactory.from_file("document.docx")

Create empty document

document = Uniword::DocumentFactory.create

Create with specific format

document = Uniword::DocumentFactory.from_file("doc.mhtml", format: :mhtml)

Constant Summary collapse

PACKAGE_PART_MAPPINGS =

Mapping from Docx::Package attribute names to DocumentRoot attribute names. Keys where names differ use explicit mapping.

{
  styles: :styles_configuration,
  numbering: :numbering_configuration,
  settings: :settings,
  font_table: :font_table,
  web_settings: :web_settings,
  theme: :theme,
  core_properties: :core_properties,
  app_properties: :app_properties,
  document_rels: :document_rels,
  theme_rels: :theme_rels,
  package_rels: :package_rels,
  content_types: :content_types,
  custom_properties: :custom_properties,
  custom_xml_items: :custom_xml_items,
  footnotes: :footnotes,
  endnotes: :endnotes,
}.freeze

Class Method Summary collapse

Class Method Details

.copy_package_parts_to_document(package, document) ⇒ void

This method returns an undefined value.

Copy package parts to document for round-trip preservation

Parameters:



198
199
200
201
202
203
204
205
# File 'lib/uniword/document_factory.rb', line 198

def copy_package_parts_to_document(package, document)
  return unless document.is_a?(Uniword::Wordprocessingml::DocumentRoot)

  PACKAGE_PART_MAPPINGS.each do |pkg_attr, doc_attr|
    value = package.send(pkg_attr)
    document.send(:"#{doc_attr}=", value) if value
  end
end

.createWordprocessingml::DocumentRoot

Create a new empty document.

Examples:

Create empty document

document = DocumentFactory.create

Returns:



29
30
31
# File 'lib/uniword/document_factory.rb', line 29

def create
  Wordprocessingml::DocumentRoot.new
end

.detect_format(path) ⇒ Symbol

Detect the format of a file.

Uses FormatDetector for signature-based detection with extension fallback.

Examples:

Detect format

format = DocumentFactory.detect_format("document.docx")
# => :docx

Parameters:

  • path (String)

    The file path

Returns:

  • (Symbol)

    The detected format (:docx, :mhtml)

Raises:

  • (ArgumentError)

    if format cannot be detected



167
168
169
170
# File 'lib/uniword/document_factory.rb', line 167

def detect_format(path)
  detector = FormatDetector.new
  detector.detect(path)
end

.from_file(path, format: :auto) ⇒ Document

Create a document from a file.

Examples:

Load DOCX file

document = DocumentFactory.from_file("document.docx")

Load with explicit format

document = DocumentFactory.from_file("doc.mht", format: :mhtml)

Parameters:

  • path (String)

    The file path

  • format (Symbol) (defaults to: :auto)

    The format (:auto, :docx, :mhtml)

Returns:

  • (Document)

    The loaded document

Raises:

  • (ArgumentError)

    if path is invalid

  • (ArgumentError)

    if format is not supported



46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# File 'lib/uniword/document_factory.rb', line 46

def from_file(path, format: :auto)
  # Handle binary strings (for docx gem compatibility)
  # Convert to StringIO if it's a binary string (contains null bytes or has binary encoding)
  path = StringIO.new(path) if path.is_a?(String) && (path.encoding == Encoding::ASCII_8BIT || path.include?("\x00"))

  validate_path(path)

  format = detect_format(path) if format == :auto

  case format
  when :docx, :docm
    package = Docx::Package.from_file(path)
    doc = package.document
    copy_package_parts_to_document(package, doc)
    doc
  when :dotx, :dotm
    Ooxml::DotxPackage.from_file(path)
  when :mhtml
    mhtml_doc = Mhtml::MhtmlPackage.from_file(path)
    # Convert Mhtml::Document to DocumentRoot for uniform API
    if mhtml_doc.is_a?(Mhtml::Document)
      Transformation::Transformer.new.mhtml_to_docx(mhtml_doc)
    else
      mhtml_doc
    end
  when :html
    html_content = read_file_content(path)
    Uniword.from_html(html_content)
  else
    raise ArgumentError, "Unsupported format: #{format}"
  end
rescue ArgumentError
  # Re-raise validation errors as-is
  raise
rescue Zip::Error => e
  raise CorruptedFileError.new(path.to_s,
                               "Invalid ZIP structure: #{e.message}")
rescue Nokogiri::XML::SyntaxError => e
  raise CorruptedFileError.new(path.to_s, "Invalid XML: #{e.message}")
rescue StandardError => e
  # Re-raise our custom errors
  raise if e.is_a?(Uniword::Error)

  # Wrap other errors
  raise CorruptedFileError.new(path.to_s, e.message)
end

.from_file_data(stream, format: :auto) ⇒ Document

Create a document from binary data (IO/StringIO stream or binary string). Compatible with docx gem API

Examples:

Load from stream

stream = StringIO.new(File.binread("doc.docx"))
document = DocumentFactory.from_file_data(stream)

Load from binary string

data = File.binread("doc.docx")
document = DocumentFactory.from_file_data(data)

Parameters:

  • stream (IO, StringIO, String)

    The binary stream or data

  • format (Symbol) (defaults to: :auto)

    The format (:auto, :docx, :mhtml)

Returns:

  • (Document)

    The loaded document (Generated::Wordprocessingml::DocumentRoot)



107
108
109
110
111
112
113
# File 'lib/uniword/document_factory.rb', line 107

def from_file_data(stream, format: :auto)
  # Convert binary string to StringIO if needed
  stream = StringIO.new(stream) if stream.is_a?(String)

  # Use from_file which already supports IO/StringIO
  from_file(stream, format: format)
end

.from_theme_file(path, format: :auto) ⇒ Theme

Create a Theme from a theme file (.thmx).

IMPORTANT: This returns a Theme object, NOT a Document! Theme files (.thmx) are standalone packages containing only theme data.

Examples:

Load theme file

theme = DocumentFactory.from_theme_file("celestial.thmx")
theme.name # => "Celestial"

Parameters:

  • path (String)

    The file path to .thmx file

  • format (Symbol) (defaults to: :auto)

    The format (:auto, :thmx)

Returns:

  • (Theme)

    The loaded theme

Raises:

  • (ArgumentError)

    if path is invalid or not a theme format



128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
# File 'lib/uniword/document_factory.rb', line 128

def from_theme_file(path, format: :auto)
  validate_path(path)

  format = detect_format(path) if format == :auto

  case format
  when :thmx
    Ooxml::ThmxPackage.from_file(path)
  else
    raise ArgumentError,
          "Not a theme format: #{format}. Use from_file() for documents."
  end
rescue ArgumentError
  # Re-raise validation errors as-is
  raise
rescue Zip::Error => e
  raise CorruptedFileError.new(path.to_s,
                               "Invalid ZIP structure: #{e.message}")
rescue Nokogiri::XML::SyntaxError => e
  raise CorruptedFileError.new(path.to_s, "Invalid XML: #{e.message}")
rescue StandardError => e
  # Re-raise our custom errors
  raise if e.is_a?(Uniword::Error)

  # Wrap other errors
  raise CorruptedFileError.new(path.to_s, e.message)
end