Class: Uniword::Transformation::MhtmlElementRenderer Private

Inherits:
Object
  • Object
show all
Defined in:
lib/uniword/transformation/mhtml_element_renderer.rb

Overview

This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.

Converts OOXML elements (Paragraph, Table, Run, etc.) to Word HTML for MHTML output.

Extracted from OoxmlToMhtmlConverter for separation of responsibilities.

Constant Summary collapse

RUN_STYLE_CLASS_MAP =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Map OOXML character style IDs (lowercase) to HTML class names Note: “Hyperlink” style is NOT mapped here — it applies to runs inside hyperlinks and should not produce visible span wrappers. The MsoHyperlink class is only used on the wrapper span around <a> elements in TOC entries.

{
  "zzmovetofollowing" => "zzMoveToFollowing",
  "stem" => "stem",
  "msofootnotereference" => "MsoFootnoteReference",
  "msotoctextspan1" => "MsoTocTextSpan",
}.freeze
PARAGRAPH_STYLE_CLASS_MAP =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Map OOXML paragraph style ID to Word HTML CSS class OOXML style IDs are used directly as CSS class names.

{
  "Heading1" => "MsoHeading1",
  "Heading2" => "MsoHeading2",
  "Heading3" => "MsoHeading3",
  "Heading4" => "MsoHeading4",
  "Heading5" => "MsoHeading5",
  "Heading6" => "MsoHeading6",
  "TOC1" => "MsoToc1",
  "TOC2" => "MsoToc2",
  "TOC3" => "MsoToc3",
  "TOC4" => "MsoToc4",
  "TOC5" => "MsoToc5",
  "TOC6" => "MsoToc6",
  "TOC7" => "MsoToc7",
  "TOC8" => "MsoToc8",
  "TOC9" => "MsoToc9",
  "zzSTDTitle" => "zzSTDTitle1",
  "FootnoteText" => "MsoFootnoteText",
  "h2annex" => "h2Annex",
  "h3annex" => "h3Annex",
  "h4annex" => "h4Annex",
  "h5annex" => "h5Annex",
  "note" => "Note",
  "figuretitle" => "FigureTitle",
  "tabletitle" => "TableTitle",
  "biblio" => "Biblio",
  "Normaali" => "MsoNormal",
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(relationships = nil, image_parts = nil) ⇒ MhtmlElementRenderer

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns a new instance of MhtmlElementRenderer.



14
15
16
17
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 14

def initialize(relationships = nil, image_parts = nil)
  @relationships = relationships
  @image_parts = image_parts
end

Instance Method Details

#apply_run_formatting(text, props) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Apply run formatting (text must already be HTML-escaped) Uses HTML4 tags (<b>, <i>, <u>) for Word compatibility



227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 227

def apply_run_formatting(text, props)
  result = text

  # Bold/Italic: use <b>/<i> (HTML4) instead of <strong>/<em> (HTML5)
  result = "<b>#{result}</b>" if props.bold && props.bold.value != false
  result = "<i>#{result}</i>" if props.italic && props.italic.value != false
  result = "<u>#{result}</u>" if props.underline&.value

  result = %(<span style="color:#{props.color.value}">#{result}</span>) if props.color&.value

  if props.size&.value
    size_pt = props.size.value.to_f / 2
    result = %(<span style="font-size:#{size_pt}pt">#{result}</span>)
  end

  if props.font.respond_to?(:ascii) && props.font.ascii
    result = %(<span style="font-family:'#{props.font.ascii}'">#{result}</span>)
  elsif props.font.is_a?(String) && !props.font.empty?
    result = %(<span style="font-family:'#{props.font}'">#{result}</span>)
  end

  result
end

#break_to_html(brk) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML Break to HTML



146
147
148
149
150
151
152
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 146

def break_to_html(brk)
  if brk.type == "page"
    %(<br clear="all" style="mso-special-character:line-break;page-break-before:always" />)
  else
    %(<br />)
  end
end

#build_vmerge_map(rows) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Build a map of vMerge states for each cell position. Returns { [row_idx, col_idx] => :start | :continue } A cell with vMerge present whose cell-above does NOT have vMerge is :start. A cell with vMerge present whose cell-above also has vMerge is :continue.



320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 320

def build_vmerge_map(rows)
  merge_map = {}
  prev_col_has_vmerge = {} # col_idx => true/false from previous row

  rows.each_with_index do |row, row_idx|
    cells = row.cells || []
    cells.each_with_index do |cell, col_idx|
      has_vmerge = cell.properties&.v_merge ? true : false

      if has_vmerge
        merge_map[[row_idx, col_idx]] = if prev_col_has_vmerge[col_idx]
                                          :continue
                                        else
                                          :start
                                        end
      end

      prev_col_has_vmerge[col_idx] = has_vmerge
    end
  end

  merge_map
end

#compute_rowspan(rows, start_row_idx, col_idx, merge_map) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Compute rowspan for a vMerge start cell by counting continuation cells below.



345
346
347
348
349
350
351
352
353
354
355
356
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 345

def compute_rowspan(rows, start_row_idx, col_idx, merge_map)
  return nil unless merge_map[[start_row_idx, col_idx]] == :start

  rowspan = 1
  ((start_row_idx + 1)...rows.size).each do |row_idx|
    break unless merge_map[[row_idx, col_idx]] == :continue

    rowspan += 1
  end

  rowspan > 1 ? rowspan : nil
end

#drawing_to_html(drawing) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert an OOXML Drawing to HTML <img> tag



181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 181

def drawing_to_html(drawing)
  inline = drawing.inline
  return "" unless inline

  graphic_data = inline.graphic&.graphic_data
  return "" unless graphic_data

  pic = graphic_data.picture
  return "" unless pic

  embed_id = pic.blip_fill&.blip&.embed
  return "" unless embed_id

  # Resolve image target from image_parts
  image_target = resolve_image_target(embed_id)
  return "" unless image_target

  # Get dimensions from extent (EMU to px: 1px = 9525 EMU)
  width_px = nil
  height_px = nil
  if inline.extent
    width_px = (inline.extent.cx.to_i / 9525.0).round if inline.extent.cx
    height_px = (inline.extent.cy.to_i / 9525.0).round if inline.extent.cy
  end

  style_attrs = []
  style_attrs << "width:#{width_px}px" if width_px
  style_attrs << "height:#{height_px}px" if height_px
  style = style_attrs.empty? ? "" : " style='#{style_attrs.join(';')}'"

  %(<img src="#{image_target}"#{style}>)
end

#element_to_html(element) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert an OOXML element to HTML



20
21
22
23
24
25
26
27
28
29
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 20

def element_to_html(element)
  case element
  when Uniword::Wordprocessingml::Paragraph
    paragraph_to_html(element)
  when Uniword::Wordprocessingml::Table
    table_to_html(element)
  else
    ""
  end
end

#endnote_reference_to_html(_run) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML EndnoteReference to HTML



141
142
143
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 141

def endnote_reference_to_html(_run)
  %(<span class="MsoEndnoteReference"><span style="mso-special-character:endnote"></span></span>)
end

#field_char_to_html(run) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML FieldChar to HTML span



155
156
157
158
159
160
161
162
163
164
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 155

def field_char_to_html(run)
  type = run.field_char.fldCharType
  style_attr = case type
               when "begin" then "mso-element:field-begin"
               when "separate" then "mso-element:field-separator"
               when "end" then "mso-element:field-end"
               else return ""
               end
  %(<span style="#{style_attr}"></span>)
end

#footnote_reference_to_html(_run) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML FootnoteReference to HTML



136
137
138
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 136

def footnote_reference_to_html(_run)
  %(<span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span>)
end

#heading_tag_for_style(style) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Map heading styles to HTML heading tags



48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 48

def heading_tag_for_style(style)
  return "p" unless style

  style_value = style.is_a?(String) ? style : style.to_s

  case style_value
  when "Heading1", "Heading2", "Heading3",
       "Heading4", "Heading5", "Heading6"
    "h#{style_value[-1].to_i}"
  when "Title", "Title2"
    "h1"
  when "Subtitle"
    "h2"
  # h1-level ISO/IEC document styles
  when "ANNEX", "ForewordTitle", "IntroTitle", "section3"
    "h1"
  else
    "p"
  end
end

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML Hyperlink to HTML



252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 252

def hyperlink_to_html(hyperlink)
  url = resolve_hyperlink_url(hyperlink)
  return "" unless url

  content = hyperlink.runs.map { |r| run_to_html(r) }.join
  link_html = %(<a href="#{escape_html(url)}">#{content}</a>)

  # TOC hyperlinks (containing msotoctextspan1 runs) are wrapped in
  # MsoHyperlink span with mso-no-proof inner span
  if toc_hyperlink?(hyperlink)
    link_html = %(<span class="MsoHyperlink"><span style="mso-no-proof:yes">#{link_html}</span></span>)
  end

  link_html
end

#instr_text_to_html(run) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML InstrText to HTML



167
168
169
170
171
172
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 167

def instr_text_to_html(run)
  text = run.instr_text.text.to_s
  return "" if text.empty?

  escape_html(text)
end

#omath_to_html(o_math) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML OMath to HTML (wrapped in stem span)



276
277
278
279
280
281
282
283
284
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 276

def omath_to_html(o_math)
  xml = o_math.to_xml
  # Strip XML declaration and namespace prefixes for clean inline HTML
  xml = xml.gsub(/<\?[^>]+>\s*/, "")
  # Remove namespace declarations that Word HTML doesn't use
  xml = xml.gsub(/ xmlns(:[^=]+)?="[^"]+"/, "")
  # Ensure m: prefix on math elements for Word HTML compatibility
  %(<span class="stem">#{xml}</span>)
end

#paragraph_to_html(paragraph) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML Paragraph to HTML



32
33
34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 32

def paragraph_to_html(paragraph)
  style_class = style_to_css_class(paragraph.style)

  content = paragraph_content_to_html(paragraph)

  # Use semantic heading tags for heading styles
  tag = heading_tag_for_style(paragraph.style)

  if content.strip.empty?
    %(<#{tag}#{style_class}><o:p>&nbsp;</o:p></#{tag}>)
  else
    %(<#{tag}#{style_class}>#{content}</#{tag}>)
  end
end

#render_run_content(run) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Render the core content of a run based on its type



120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 120

def render_run_content(run)
  if run.field_char
    field_char_to_html(run)
  elsif run.instr_text
    instr_text_to_html(run)
  elsif run.tab
    tab_to_html(run)
  else
    text = run.text.to_s
    return "" if text.empty?

    escape_html(text)
  end
end

#resolve_image_target(embed_id) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Resolve image target path from image_parts



215
216
217
218
219
220
221
222
223
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 215

def resolve_image_target(embed_id)
  return nil unless @image_parts

  entry = @image_parts.find { |pair| pair[0] == embed_id }
  return nil unless entry

  image_data = entry[1]
  image_data[:target]
end

#run_style_to_class(style_id) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML run style ID to HTML class name



438
439
440
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 438

def run_style_to_class(style_id)
  RUN_STYLE_CLASS_MAP.fetch(style_id.downcase, style_id)
end

#run_to_html(run) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML Run to HTML



70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 70

def run_to_html(run)
  # Handle breaks and drawings first (these never get style wrapping)
  return break_to_html(run.break) if run.break
  if run.drawings && !run.drawings.empty?
    return drawing_to_html(run.drawings.first)
  end

  # Handle footnote/endnote references (rendered as special markers)
  return footnote_reference_to_html(run) if run.footnote_reference
  return endnote_reference_to_html(run) if run.endnote_reference

  # Render content based on run type
  content = render_run_content(run)

  # Skip truly empty runs without formatting
  if content.nil? || content.empty?
    props = run.properties
    # Emit formatting tags for empty runs with bold/italic formatting
    if props && (props.bold || props.italic)
      content = ""
    else
      return ""
    end
  end

  # Apply character-level formatting (bold, italic, underline)
  # and style class wrapping
  props = run.properties
  if props
    unless run.field_char || run.tab
      content = apply_run_formatting(content,
                                     props)
    end
    if props.style&.value
      style_id = props.style.value.downcase
      # Skip wrapping for internal OOXML styles that shouldn't produce
      # visible HTML class wrappers
      is_internal_style = %w[stem hyperlink].include?(style_id)
      is_stem_spacer = (style_id == "stem") && run.text.to_s.strip.empty?
      unless is_internal_style || is_stem_spacer
        style_val = run_style_to_class(props.style.value)
        content = %(<span class="#{style_val}">#{content}</span>)
      end
    end
  end

  content
end

#sdt_to_inline_html(sdt) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML SDT block to inline MHT SDT



409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 409

def sdt_to_inline_html(sdt)
  return "" unless sdt.content

  sdt_props = sdt.properties
  sdt_content = sdt.content

  text = extract_sdt_text(sdt_content)

  sdt_attrs = build_sdt_attrs(sdt_props)

  if text.empty?
    %(<w:sdt#{sdt_attrs}><w:sdtPr></w:sdtPr></w:sdt>)
  else
    %(<w:sdt#{sdt_attrs}><w:sdtPr></w:sdtPr><w:sdtContent><span>#{escape_html(text)}</span></w:sdtContent></w:sdt>)
  end
end

#simple_cell_paragraph?(para) ⇒ Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Check if a paragraph is simple enough to render without <p> wrapping in a table cell. Single paragraphs with inline content (runs, hyperlinks) are considered simple. Only block-level structures (SDTs, oMathPara) make it complex.

Returns:

  • (Boolean)


398
399
400
401
402
403
404
405
406
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 398

def simple_cell_paragraph?(para)
  return true if para.runs.empty? && para.hyperlinks.empty?
  # oMathPara is block-level and needs wrapping
  return false if para.o_math_paras && !para.o_math_paras.empty?
  # SDTs need wrapping
  return false if para.sdts && !para.sdts.empty?

  true
end

#style_to_css_class(style) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Map OOXML style to CSS class



474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 474

def style_to_css_class(style)
  return " class=MsoNormal" unless style

  style_value = style.is_a?(String) ? style : style.to_s

  css_class = PARAGRAPH_STYLE_CLASS_MAP[style_value]
  return " class=#{css_class}" if css_class

  # Direct mapping for known ISO paragraph styles
  case style_value
  when "Title", "Title2" then " class=MsoTitle"
  when "Subtitle" then " class=MsoSubtitle"
  else
    # Use the OOXML style ID directly as CSS class name
    " class=#{style_value}"
  end
end

#tab_to_html(_run) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML Tab to HTML



175
176
177
178
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 175

def tab_to_html(_run)
  # TOC tab leader: rendered as dotted leader
  %(<span style="mso-tab-count:1 dotted">. </span>)
end

#table_cell_to_html(cell, rowspan: nil) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML TableCell to HTML For simple cells (single plain paragraph), render text directly. For complex cells (multiple paragraphs, styled text), wrap in <p>. Header cells (<th>) use <th> tag instead of <td>.

Parameters:

  • cell (TableCell)

    The cell to render

  • rowspan (Integer, nil) (defaults to: nil)

    Rowspan from vMerge computation



365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 365

def table_cell_to_html(cell, rowspan: nil)
  paragraphs = cell.paragraphs || []
  tag = cell.header ? "th" : "td"

  # Build merge attributes
  attrs = []
  if cell.properties&.grid_span&.value && cell.properties.grid_span.value.to_i > 1
    attrs << "colspan=#{cell.properties.grid_span.value}"
  end
  attrs << "rowspan=#{rowspan}" if rowspan
  attr_str = attrs.empty? ? "" : " #{attrs.join(' ')}"

  if paragraphs.size == 1 && simple_cell_paragraph?(paragraphs.first)
    # Simple cell: render text content directly (no <p> wrapping)
    content = paragraph_content_to_html(paragraphs.first)
    %(<#{tag}#{attr_str}>#{content}</#{tag}>)
  else
    content = paragraphs.map do |para|
      paragraph_to_html(para)
    end.join("\n")

    <<~HTML
      <#{tag}#{attr_str}>
      #{content}
      </#{tag}>
    HTML
  end
end

#table_to_html(table) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Convert OOXML Table to HTML



287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 287

def table_to_html(table)
  rows = table.rows || []

  # Pre-compute vertical merge map: { [row_idx, col_idx] => :start | :continue }
  merge_map = build_vmerge_map(rows)

  rows_html = rows.each_with_index.map do |row, row_idx|
    cells = row.cells || []
    cells_html = cells.each_with_index.filter_map do |cell, col_idx|
      # Skip vMerge continuation cells — they're absorbed by the start cell's rowspan
      merge_state = merge_map[[row_idx, col_idx]]
      next if merge_state == :continue

      # Compute rowspan from merge map
      rowspan = compute_rowspan(rows, row_idx, col_idx, merge_map)

      table_cell_to_html(cell, rowspan: rowspan)
    end.join

    %(<tr>#{cells_html}</tr>)
  end.join("\n")

  <<~HTML
    <table>
    #{rows_html}
    </table>
  HTML
end

#toc_hyperlink?(hyperlink) ⇒ Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Check if a hyperlink is a TOC entry by inspecting its runs

Returns:

  • (Boolean)


269
270
271
272
273
# File 'lib/uniword/transformation/mhtml_element_renderer.rb', line 269

def toc_hyperlink?(hyperlink)
  hyperlink.runs.any? do |r|
    r.properties&.style&.value&.downcase == "msotoctextspan1"
  end
end