Class: Uniword::Transformation::OoxmlToMhtmlConverter
- Inherits:
-
Object
- Object
- Uniword::Transformation::OoxmlToMhtmlConverter
- Defined in:
- lib/uniword/transformation/ooxml_to_mhtml_converter.rb
Overview
Converts OOXML DocumentRoot to Mhtml::Document for full-fidelity MHT output.
This is COMPLETELY SEPARATE from OoxmlToHtmlConverter which produces HTML5. This converter produces Word HTML4 with proper MIME multipart structure.
Delegates to:
-
MhtmlStyleBuilder for static style templates
-
MhtmlElementRenderer for element-to-HTML conversion
-
MhtmlMetadataBuilder for metadata, properties, and file parts
Constant Summary collapse
- MSO_NORMAL_TABLE_STYLE =
Static MsoNormalTable CSS (used in wrap_html_document head). Only used when MhtmlStyleBuilder does not provide custom CSS.
<<~CSS <!--[if gte mso 10]> <style> /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10pt; font-family:"Cambria",serif;} </style> <![endif]--> CSS
- VML_BEHAVIOR_STYLE =
Static VML behavior style block
<<~CSS <!--[if !mso]> <style> v:* {behavior:url(#default#VML);} o:* {behavior:url(#default#VML);} w:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> CSS
- WORD_DOCUMENT_XML =
Static WordDocument XML block (compatibility settings + MathPr)
<<~XML <!--[if gte mso 9]><xml> <w:WordDocument xmlns:w="urn:schemas-microsoft-com:office:word"> <w:TrackMoves>false</w:TrackMoves> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>en-US</w:LidThemeOther> <w:LidThemeAsian>ZH-CN</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:EnableOpenTypeKerning/> <w:DontFlipMirrorIndents/> <w:OverrideTableStyleHps/> <w:UseFELayout/> </w:Compatibility> <w:MathPr> <w:MathFont w:val="Cambria Math"/> <w:brkBin w:val="before"/> <w:brkBinSub w:val="--"/> <w:smallFrac w:val="off"/> <w:dispDef/> <w:lMargin w:val="0"/> <w:rMargin w:val="0"/> <w:defJc w:val="centerGroup"/> <w:wrapIndent w:val="1440"/> <w:intLim w:val="subSup"/> <w:naryLim w:val="undOvr"/> </w:MathPr> </w:WordDocument> </xml><![endif]--> XML
- OFFICE_SETTINGS_XML =
Static OfficeDocumentSettings XML
<<~XML <o:OfficeDocumentSettings xmlns:o="urn:schemas-microsoft-com:office:office"> <o:AllowPNG/> </o:OfficeDocumentSettings> XML
Class Method Summary collapse
-
.document_to_html_body(document, core_properties = nil, relationships = nil) ⇒ String
Convert OOXML DocumentRoot to HTML body content (for Mhtml::HtmlPart).
-
.document_to_mht(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ Uniword::Mhtml::Document
Convert OOXML DocumentRoot to Mhtml::Document.
Instance Method Summary collapse
-
#build_html_body ⇒ Object
Build the HTML body content.
-
#build_mhtml_document ⇒ Object
Build the complete Mhtml::Document.
-
#core_properties ⇒ Object
Get the core properties to use (provided or from document).
-
#document_name ⇒ Object
Get document name via metadata builder.
-
#initialize(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ OoxmlToMhtmlConverter
constructor
A new instance of OoxmlToMhtmlConverter.
Constructor Details
#initialize(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ OoxmlToMhtmlConverter
Returns a new instance of OoxmlToMhtmlConverter.
132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 132 def initialize(document, core_properties = nil, relationships = nil, document_name = nil) @document = document @relationships = relationships @core_properties = core_properties @metadata_builder = MhtmlMetadataBuilder.new( document, core_properties, relationships, document_name ) @element_renderer = MhtmlElementRenderer.new(relationships, document.image_parts) end |
Class Method Details
.document_to_html_body(document, core_properties = nil, relationships = nil) ⇒ String
Convert OOXML DocumentRoot to HTML body content (for Mhtml::HtmlPart)
126 127 128 129 130 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 126 def self.document_to_html_body(document, core_properties = nil, relationships = nil) converter = new(document, core_properties, relationships) converter.build_html_body end |
.document_to_mht(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ Uniword::Mhtml::Document
Convert OOXML DocumentRoot to Mhtml::Document
114 115 116 117 118 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 114 def self.document_to_mht(document, core_properties = nil, relationships = nil, document_name = nil) converter = new(document, core_properties, relationships, document_name) converter.build_mhtml_document end |
Instance Method Details
#build_html_body ⇒ Object
Build the HTML body content
190 191 192 193 194 195 196 197 198 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 190 def build_html_body body = @document.body return "" unless body # Split body elements into sections based on paragraph section_properties sections = split_into_sections(body.elements) wrap_html_document(sections) end |
#build_mhtml_document ⇒ Object
Build the complete Mhtml::Document
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 156 def build_mhtml_document mhtml_doc = Uniword::Mhtml::Document.new # Build HTML content html_content = build_html_body html_part = Uniword::Mhtml::HtmlPart.new html_part.content_type = "text/html" html_part.content_transfer_encoding = "quoted-printable" html_part.raw_content = html_content html_part.content_location = "file:///C:/D057922B/#{document_name}.htm" mhtml_doc.html_part = html_part mhtml_doc.parts << html_part # Build metadata mhtml_doc.document_properties = @metadata_builder.build_document_properties # Build filelist.xml filelist_part = @metadata_builder.build_filelist_part mhtml_doc.parts << filelist_part if filelist_part # Build image parts from document.image_parts @metadata_builder.build_image_parts.each do |image_part| mhtml_doc.parts << image_part end # Generate deterministic boundary based on document name hash = document_name.gsub(/[^a-zA-Z0-9]/, "").upcase[0..7] || "DOC" mhtml_doc.boundary = "----=_NextPart_01DC60F8.#{hash}" mhtml_doc end |
#core_properties ⇒ Object
Get the core properties to use (provided or from document)
146 147 148 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 146 def core_properties @core_properties || @document.core_properties end |
#document_name ⇒ Object
Get document name via metadata builder
151 152 153 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 151 def document_name @metadata_builder.document_name end |