Class: Uniword::Transformation::OoxmlToHtmlConverter

Inherits:
Object
  • Object
show all
Defined in:
lib/uniword/transformation/ooxml_to_html_converter.rb

Overview

SERVICE for converting OOXML elements to HTML.

Pure functions - no state, no side effects. Used by Transformer when target_format is :mhtml.

Examples:

Convert OOXML document to HTML

html = OoxmlToHtmlConverter.document_to_html(doc)
# => "<html>..."

Class Method Summary collapse

Class Method Details

.document_to_html(document) ⇒ String

Convert OOXML Document to HTML string

Parameters:

Returns:

  • (String)

    HTML content



19
20
21
22
23
24
25
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 19

def self.document_to_html(document)
  body = document.body
  return "" unless body

  elements_html = body.elements.map { |e| element_to_html(e) }.join("\n")
  wrap_html(elements_html, document)
end

.element_to_html(element) ⇒ String

Convert a single OOXML element to HTML

Parameters:

  • element (Object)

    OOXML element

Returns:

  • (String)

    HTML string



31
32
33
34
35
36
37
38
39
40
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 31

def self.element_to_html(element)
  case element
  when Uniword::Wordprocessingml::Paragraph
    paragraph_to_html(element)
  when Uniword::Wordprocessingml::Table
    table_to_html(element)
  else
    ""
  end
end

.escape_html(text) ⇒ String

Escape HTML special characters

Parameters:

  • text (String)

    Raw text

Returns:

  • (String)

    Escaped text



162
163
164
165
166
167
168
169
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 162

def self.escape_html(text)
  text.to_s
    .gsub("&", "&amp;")
    .gsub("<", "&lt;")
    .gsub(">", "&gt;")
    .gsub('"', "&quot;")
    .gsub("'", "&#39;")
end

.font_size_to_html(size_value) ⇒ String

Convert OOXML font size (half-points) to HTML font size

Parameters:

  • size_value (String, nil)

    Font size in half-points

Returns:

  • (String)

    HTML font size with unit



150
151
152
153
154
155
156
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 150

def self.font_size_to_html(size_value)
  return nil unless size_value

  # Convert half-points to points
  size_pts = size_value.to_f / 2
  "#{size_pts}pt"
end

.paragraph_style(paragraph) ⇒ String

Extract paragraph style attribute

Parameters:

Returns:

  • (String)

    HTML class/style attribute or empty string



137
138
139
140
141
142
143
144
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 137

def self.paragraph_style(paragraph)
  return "" unless paragraph.properties

  style = paragraph.properties.style
  return "" unless style

  " class=\"#{escape_html(style)}\""
end

.paragraph_to_html(paragraph) ⇒ String

Convert OOXML Paragraph to HTML

Parameters:

Returns:

  • (String)

    HTML <p> element



46
47
48
49
50
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 46

def self.paragraph_to_html(paragraph)
  runs_html = paragraph.runs.map { |r| run_to_html(r) }.join
  style = paragraph_style(paragraph)
  "<p#{style}>#{runs_html}</p>"
end

.run_to_html(run) ⇒ String

Convert OOXML Run to HTML

Parameters:

Returns:

  • (String)

    HTML text content with inline formatting



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 56

def self.run_to_html(run)
  text = escape_html(run.text || "")
  return text if text.empty?

  props = run.properties
  return text unless props

  # Apply inline formatting
  text = "<strong>#{text}</strong>" if props.bold
  text = "<em>#{text}</em>" if props.italic
  text = "<u>#{text}</u>" if props.underline&.value
  text = "<span style=\"color:#{props.color&.value}\">#{text}</span>" if props.color&.value
  text = "<span style=\"font-size:#{font_size_to_html(props.size&.value)}\">#{text}</span>" if props.size&.value
  text = "<span style=\"font-family:'#{props.font&.ascii}'\">#{text}</span>" if props.font&.ascii

  text
end

.table_cell_to_html(cell) ⇒ String

Convert OOXML TableCell to HTML

Parameters:

Returns:

  • (String)

    HTML <td> element



98
99
100
101
102
103
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 98

def self.table_cell_to_html(cell)
  paragraphs_html = cell.paragraphs.map do |p|
    paragraph_to_html(p)
  end.join("\n")
  "<td>\n#{paragraphs_html}\n</td>"
end

.table_row_to_html(row) ⇒ String

Convert OOXML TableRow to HTML

Parameters:

Returns:

  • (String)

    HTML <tr> element



87
88
89
90
91
92
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 87

def self.table_row_to_html(row)
  cells_html = row.cells.map do |cell|
    table_cell_to_html(cell)
  end.join("\n")
  "<tr>\n#{cells_html}\n</tr>"
end

.table_to_html(table) ⇒ String

Convert OOXML Table to HTML

Parameters:

Returns:

  • (String)

    HTML <table> element



78
79
80
81
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 78

def self.table_to_html(table)
  rows_html = table.rows.map { |row| table_row_to_html(row) }.join("\n")
  "<table>\n#{rows_html}\n</table>"
end

.wrap_html(body_html, document) ⇒ String

Wrap HTML content in full HTML document

Parameters:

Returns:

  • (String)

    Full HTML document



110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# File 'lib/uniword/transformation/ooxml_to_html_converter.rb', line 110

def self.wrap_html(body_html, document)
  title = document.title ? escape_html(document.title) : "Document"
  core_props = document.core_properties
  author = core_props.respond_to?(:creator) ? core_props.creator : nil

  meta_tags = []
  meta_tags << "<meta name=\"author\" content=\"#{escape_html(author)}\">" if author

  <<~HTML
    <!DOCTYPE html>
    <html>
    <head>
      <meta charset="utf-8">
      <title>#{title}</title>
      #{meta_tags.join("\n")}
    </head>
    <body>
    #{body_html}
    </body>
    </html>
  HTML
end