Class: Uniword::Transformation::OoxmlToMhtmlConverter

Inherits:
Object
  • Object
show all
Defined in:
lib/uniword/transformation/ooxml_to_mhtml_converter.rb

Overview

Converts OOXML DocumentRoot to Mhtml::Document for full-fidelity MHT output.

This is COMPLETELY SEPARATE from OoxmlToHtmlConverter which produces HTML5. This converter produces Word HTML4 with proper MIME multipart structure.

Delegates to:

  • MhtmlStyleBuilder for static style templates

  • MhtmlElementRenderer for element-to-HTML conversion

  • MhtmlMetadataBuilder for metadata, properties, and file parts

Examples:

Transform DOCX to MHT

docx_doc = Uniword::Docx::Package.from_file("document.docx")
mhtml_doc = OoxmlToMhtmlConverter.document_to_mht(docx_doc)
output = Uniword::Infrastructure::MimePackager.new(mhtml_doc).build_mime_content

Constant Summary collapse

MSO_NORMAL_TABLE_STYLE =

Static MsoNormalTable CSS (used in wrap_html_document head)

<<~CSS
  <!--[if gte mso 10]>
  <style>
   /* Style Definitions */
   table.MsoNormalTable
  	{mso-style-name:"Table Normal";
  	mso-tstyle-rowband-size:0;
  	mso-tstyle-colband-size:0;
  	mso-style-noshow:yes;
  	mso-style-priority:99;
  	mso-style-parent:"";
  	mso-padding-alt:0in 5.4pt 0in 5.4pt;
  	mso-para-margin-top:0in;
  	mso-para-margin-right:0in;
  	mso-para-margin-bottom:8.0pt;
  	mso-para-margin-left:0in;
  	line-height:115%;
  	mso-pagination:widow-orphan;
  	font-size:12.0pt;
  	font-family:"Aptos",sans-serif;
  	mso-ascii-font-family:Aptos;
  	mso-ascii-theme-font:minor-latin;
  	mso-hansi-font-family:Aptos;
  	mso-hansi-theme-font:minor-latin;
  	mso-font-kerning:1.0pt;
  	mso-ligatures:standardcontextual;}
  </style>
  <![endif]-->
CSS
VML_BEHAVIOR_STYLE =

Static VML behavior style block

<<~CSS
  <!--[if !mso]>
  <style>
  v:* {behavior:url(#default#VML);}
  o:* {behavior:url(#default#VML);}
  w:* {behavior:url(#default#VML);}
  .shape {behavior:url(#default#VML);}
  </style>
  <![endif]-->
CSS
WORD_DOCUMENT_XML =

Static WordDocument XML block (compatibility settings + MathPr)

<<~XML
  <!--[if gte mso 9]><xml>
   <w:WordDocument xmlns:w="urn:schemas-microsoft-com:office:word">
    <w:TrackMoves>false</w:TrackMoves>
    <w:TrackFormatting/>
    <w:PunctuationKerning/>
    <w:ValidateAgainstSchemas/>
    <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
    <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
    <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
    <w:DoNotPromoteQF/>
    <w:LidThemeOther>en-US</w:LidThemeOther>
    <w:LidThemeAsian>ZH-CN</w:LidThemeAsian>
    <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
    <w:Compatibility>
     <w:BreakWrappedTables/>
     <w:SnapToGridInCell/>
     <w:WrapTextWithPunct/>
     <w:UseAsianBreakRules/>
     <w:DontGrowAutofit/>
     <w:SplitPgBreakAndParaMark/>
     <w:EnableOpenTypeKerning/>
     <w:DontFlipMirrorIndents/>
     <w:OverrideTableStyleHps/>
     <w:UseFELayout/>
    </w:Compatibility>
    <w:MathPr>
     <w:MathFont w:val="Cambria Math"/>
     <w:brkBin w:val="before"/>
     <w:brkBinSub w:val="&#45;-"/>
     <w:smallFrac w:val="off"/>
     <w:dispDef/>
     <w:lMargin w:val="0"/>
     <w:rMargin w:val="0"/>
     <w:defJc w:val="centerGroup"/>
     <w:wrapIndent w:val="1440"/>
     <w:intLim w:val="subSup"/>
     <w:naryLim w:val="undOvr"/>
    </w:MathPr>
   </w:WordDocument>
  </xml><![endif]-->
XML
OFFICE_SETTINGS_XML =

Static OfficeDocumentSettings XML

<<~XML
  <o:OfficeDocumentSettings xmlns:o="urn:schemas-microsoft-com:office:office">
   <o:AllowPNG/>
  </o:OfficeDocumentSettings>
XML

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ OoxmlToMhtmlConverter

Returns a new instance of OoxmlToMhtmlConverter.



140
141
142
143
144
145
146
147
148
149
150
151
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 140

def initialize(document, core_properties = nil, relationships = nil,
document_name = nil)
  @document = document
  @relationships = relationships
  @core_properties = core_properties

  @metadata_builder = MhtmlMetadataBuilder.new(
    document, core_properties, relationships, document_name
  )
  @element_renderer = MhtmlElementRenderer.new(relationships,
                                               document.image_parts)
end

Class Method Details

.document_to_html_body(document, core_properties = nil, relationships = nil) ⇒ String

Convert OOXML DocumentRoot to HTML body content (for Mhtml::HtmlPart)

Parameters:

Returns:

  • (String)

    Word HTML4 body content



134
135
136
137
138
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 134

def self.document_to_html_body(document, core_properties = nil,
relationships = nil)
  converter = new(document, core_properties, relationships)
  converter.build_html_body
end

.document_to_mht(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ Uniword::Mhtml::Document

Convert OOXML DocumentRoot to Mhtml::Document

Parameters:

Returns:



122
123
124
125
126
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 122

def self.document_to_mht(document, core_properties = nil, relationships = nil,
                         document_name = nil)
  converter = new(document, core_properties, relationships, document_name)
  converter.build_mhtml_document
end

Instance Method Details

#build_html_bodyObject

Build the HTML body content



198
199
200
201
202
203
204
205
206
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 198

def build_html_body
  body = @document.body
  return "" unless body

  # Split body elements into sections based on paragraph section_properties
  sections = split_into_sections(body.elements)

  wrap_html_document(sections)
end

#build_mhtml_documentObject

Build the complete Mhtml::Document



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 164

def build_mhtml_document
  mhtml_doc = Uniword::Mhtml::Document.new

  # Build HTML content
  html_content = build_html_body
  html_part = Uniword::Mhtml::HtmlPart.new
  html_part.content_type = "text/html"
  html_part.content_transfer_encoding = "quoted-printable"
  html_part.raw_content = html_content
  html_part.content_location = "file:///C:/D057922B/#{document_name}.htm"

  mhtml_doc.html_part = html_part
  mhtml_doc.parts << html_part

  # Build metadata
  mhtml_doc.document_properties = @metadata_builder.build_document_properties

  # Build filelist.xml
  filelist_part = @metadata_builder.build_filelist_part
  mhtml_doc.parts << filelist_part if filelist_part

  # Build image parts from document.image_parts
  @metadata_builder.build_image_parts.each do |image_part|
    mhtml_doc.parts << image_part
  end

  # Generate deterministic boundary based on document name
  hash = document_name.gsub(/[^a-zA-Z0-9]/, "").upcase[0..7] || "DOC"
  mhtml_doc.boundary = "----=_NextPart_01DC60F8.#{hash}"

  mhtml_doc
end

#core_propertiesObject

Get the core properties to use (provided or from document)



154
155
156
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 154

def core_properties
  @core_properties || @document.core_properties
end

#document_nameObject

Get document name via metadata builder



159
160
161
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 159

def document_name
  @metadata_builder.document_name
end