Class: Uniword::Transformation::OoxmlToMhtmlConverter
- Inherits:
-
Object
- Object
- Uniword::Transformation::OoxmlToMhtmlConverter
- Defined in:
- lib/uniword/transformation/ooxml_to_mhtml_converter.rb
Overview
Converts OOXML DocumentRoot to Mhtml::Document for full-fidelity MHT output.
This is COMPLETELY SEPARATE from OoxmlToHtmlConverter which produces HTML5. This converter produces Word HTML4 with proper MIME multipart structure.
Delegates to:
-
MhtmlStyleBuilder for static style templates
-
MhtmlElementRenderer for element-to-HTML conversion
-
MhtmlMetadataBuilder for metadata, properties, and file parts
Constant Summary collapse
- MSO_NORMAL_TABLE_STYLE =
Static MsoNormalTable CSS (used in wrap_html_document head)
<<~CSS <!--[if gte mso 10]> <style> /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:8.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Aptos",sans-serif; mso-ascii-font-family:Aptos; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Aptos; mso-hansi-theme-font:minor-latin; mso-font-kerning:1.0pt; mso-ligatures:standardcontextual;} </style> <![endif]--> CSS
- VML_BEHAVIOR_STYLE =
Static VML behavior style block
<<~CSS <!--[if !mso]> <style> v:* {behavior:url(#default#VML);} o:* {behavior:url(#default#VML);} w:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> CSS
- WORD_DOCUMENT_XML =
Static WordDocument XML block (compatibility settings + MathPr)
<<~XML <!--[if gte mso 9]><xml> <w:WordDocument xmlns:w="urn:schemas-microsoft-com:office:word"> <w:TrackMoves>false</w:TrackMoves> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>en-US</w:LidThemeOther> <w:LidThemeAsian>ZH-CN</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:EnableOpenTypeKerning/> <w:DontFlipMirrorIndents/> <w:OverrideTableStyleHps/> <w:UseFELayout/> </w:Compatibility> <w:MathPr> <w:MathFont w:val="Cambria Math"/> <w:brkBin w:val="before"/> <w:brkBinSub w:val="--"/> <w:smallFrac w:val="off"/> <w:dispDef/> <w:lMargin w:val="0"/> <w:rMargin w:val="0"/> <w:defJc w:val="centerGroup"/> <w:wrapIndent w:val="1440"/> <w:intLim w:val="subSup"/> <w:naryLim w:val="undOvr"/> </w:MathPr> </w:WordDocument> </xml><![endif]--> XML
- OFFICE_SETTINGS_XML =
Static OfficeDocumentSettings XML
<<~XML <o:OfficeDocumentSettings xmlns:o="urn:schemas-microsoft-com:office:office"> <o:AllowPNG/> </o:OfficeDocumentSettings> XML
Class Method Summary collapse
-
.document_to_html_body(document, core_properties = nil, relationships = nil) ⇒ String
Convert OOXML DocumentRoot to HTML body content (for Mhtml::HtmlPart).
-
.document_to_mht(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ Uniword::Mhtml::Document
Convert OOXML DocumentRoot to Mhtml::Document.
Instance Method Summary collapse
-
#build_html_body ⇒ Object
Build the HTML body content.
-
#build_mhtml_document ⇒ Object
Build the complete Mhtml::Document.
-
#core_properties ⇒ Object
Get the core properties to use (provided or from document).
-
#document_name ⇒ Object
Get document name via metadata builder.
-
#initialize(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ OoxmlToMhtmlConverter
constructor
A new instance of OoxmlToMhtmlConverter.
Constructor Details
#initialize(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ OoxmlToMhtmlConverter
Returns a new instance of OoxmlToMhtmlConverter.
140 141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 140 def initialize(document, core_properties = nil, relationships = nil, document_name = nil) @document = document @relationships = relationships @core_properties = core_properties @metadata_builder = MhtmlMetadataBuilder.new( document, core_properties, relationships, document_name ) @element_renderer = MhtmlElementRenderer.new(relationships, document.image_parts) end |
Class Method Details
.document_to_html_body(document, core_properties = nil, relationships = nil) ⇒ String
Convert OOXML DocumentRoot to HTML body content (for Mhtml::HtmlPart)
134 135 136 137 138 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 134 def self.document_to_html_body(document, core_properties = nil, relationships = nil) converter = new(document, core_properties, relationships) converter.build_html_body end |
.document_to_mht(document, core_properties = nil, relationships = nil, document_name = nil) ⇒ Uniword::Mhtml::Document
Convert OOXML DocumentRoot to Mhtml::Document
122 123 124 125 126 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 122 def self.document_to_mht(document, core_properties = nil, relationships = nil, document_name = nil) converter = new(document, core_properties, relationships, document_name) converter.build_mhtml_document end |
Instance Method Details
#build_html_body ⇒ Object
Build the HTML body content
198 199 200 201 202 203 204 205 206 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 198 def build_html_body body = @document.body return "" unless body # Split body elements into sections based on paragraph section_properties sections = split_into_sections(body.elements) wrap_html_document(sections) end |
#build_mhtml_document ⇒ Object
Build the complete Mhtml::Document
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 164 def build_mhtml_document mhtml_doc = Uniword::Mhtml::Document.new # Build HTML content html_content = build_html_body html_part = Uniword::Mhtml::HtmlPart.new html_part.content_type = "text/html" html_part.content_transfer_encoding = "quoted-printable" html_part.raw_content = html_content html_part.content_location = "file:///C:/D057922B/#{document_name}.htm" mhtml_doc.html_part = html_part mhtml_doc.parts << html_part # Build metadata mhtml_doc.document_properties = @metadata_builder.build_document_properties # Build filelist.xml filelist_part = @metadata_builder.build_filelist_part mhtml_doc.parts << filelist_part if filelist_part # Build image parts from document.image_parts @metadata_builder.build_image_parts.each do |image_part| mhtml_doc.parts << image_part end # Generate deterministic boundary based on document name hash = document_name.gsub(/[^a-zA-Z0-9]/, "").upcase[0..7] || "DOC" mhtml_doc.boundary = "----=_NextPart_01DC60F8.#{hash}" mhtml_doc end |
#core_properties ⇒ Object
Get the core properties to use (provided or from document)
154 155 156 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 154 def core_properties @core_properties || @document.core_properties end |
#document_name ⇒ Object
Get document name via metadata builder
159 160 161 |
# File 'lib/uniword/transformation/ooxml_to_mhtml_converter.rb', line 159 def document_name @metadata_builder.document_name end |