Class: Uniword::Transformation::HtmlToOoxmlConverter

Inherits:
Object
  • Object
show all
Defined in:
lib/uniword/transformation/html_to_ooxml_converter.rb

Overview

SERVICE for converting HTML to OOXML elements.

Public API coordinator — delegates to HtmlElementBuilder for OOXML construction and HtmlFormattingMapper for CSS/style handling.

Pure functions — no state, no side effects. Used by Transformer when source_format is :mhtml.

Class Method Summary collapse

Class Method Details

.create_run(text) ⇒ Uniword::Wordprocessingml::Run

Create a simple run without properties.

Parameters:

  • text (String)

    Run text

Returns:



111
112
113
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 111

def self.create_run(text)
  HtmlElementBuilder.create_run(text)
end

.create_run_from_element(element) ⇒ Uniword::Wordprocessingml::Run?

Create a run from HTML element with inline formatting.

Parameters:

  • element (Nokogiri::XML::Element)

    HTML element

Returns:



103
104
105
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 103

def self.create_run_from_element(element)
  HtmlElementBuilder.create_run_from_element(element)
end

.decode_html_entities(text) ⇒ String

Decode HTML entities.

Parameters:

  • text (String)

Returns:

  • (String)


119
120
121
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 119

def self.decode_html_entities(text)
  HtmlFormattingMapper.decode_entities(text)
end

.extract_body(html) ⇒ String

Extract body content from HTML document.

Parameters:

  • html (String)

Returns:

  • (String)


127
128
129
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 127

def self.extract_body(html)
  HtmlFormattingMapper.extract_body(html)
end

.html_cell_to_cell(html_cell) ⇒ Uniword::Wordprocessingml::TableCell?

Convert a single HTML cell to OOXML table cell.

Parameters:

  • html_cell (Nokogiri::XML::Element)

    HTML td/th element

Returns:



79
80
81
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 79

def self.html_cell_to_cell(html_cell)
  HtmlElementBuilder.build_cell(html_cell)
end

.html_element_to_paragraph(element) ⇒ Uniword::Wordprocessingml::Paragraph?

Convert a single HTML element to OOXML paragraph.

Parameters:

  • element (Nokogiri::XML::Element)

    HTML element

Returns:



87
88
89
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 87

def self.html_element_to_paragraph(element)
  HtmlElementBuilder.build_paragraph(element)
end

.html_row_to_row(html_row) ⇒ Uniword::Wordprocessingml::TableRow?

Convert a single HTML row to OOXML table row.

Parameters:

  • html_row (Nokogiri::XML::Element)

    HTML tr element

Returns:



71
72
73
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 71

def self.html_row_to_row(html_row)
  HtmlElementBuilder.build_row(html_row)
end

.html_table_to_table(html_table) ⇒ Uniword::Wordprocessingml::Table?

Convert a single HTML table to OOXML table.

Parameters:

  • html_table (Nokogiri::XML::Element)

    HTML table element

Returns:



63
64
65
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 63

def self.html_table_to_table(html_table)
  HtmlElementBuilder.build_table(html_table)
end

.html_to_paragraphs(html_content) ⇒ Array<Uniword::Wordprocessingml::Paragraph>

Convert HTML content to OOXML paragraphs.

Parameters:

  • html_content (String)

    HTML content (may include full HTML document)

Returns:



17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 17

def self.html_to_paragraphs(html_content)
  return [] if html_content.nil? || html_content.empty?

  body = HtmlFormattingMapper.extract_body(html_content)
  doc = Nokogiri::HTML(body)
  paragraphs = []

  doc.css("p, h1, h2, h3, h4, h5, h6, li, div, tr").each do |element|
    next if element.ancestors("td, th").any?
    next if %w[tr td].include?(element.name)

    if %w[div li].include?(element.name) &&
        element.css("p, h1, h2, h3, h4, h5, h6, li, div, tr").any?
      next
    end

    paragraph = HtmlElementBuilder.build_paragraph(element)
    paragraphs << paragraph if paragraph
  end

  paragraphs
end

.html_to_tables(html_content) ⇒ Array<Uniword::Wordprocessingml::Table>

Convert HTML content to OOXML tables.

Parameters:

  • html_content (String)

    HTML content (may include full HTML document)

Returns:



44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 44

def self.html_to_tables(html_content)
  return [] if html_content.nil? || html_content.empty?

  body = HtmlFormattingMapper.extract_body(html_content)
  doc = Nokogiri::HTML(body)
  tables = []

  doc.css("table").each do |html_table|
    table = HtmlElementBuilder.build_table(html_table)
    tables << table if table
  end

  tables
end

.map_css_class_to_style(css_class) ⇒ String?

Map MHT CSS class to OOXML style name.

Parameters:

  • css_class (String)

    CSS class string

Returns:

  • (String, nil)


95
96
97
# File 'lib/uniword/transformation/html_to_ooxml_converter.rb', line 95

def self.map_css_class_to_style(css_class)
  HtmlFormattingMapper.map_css_class_to_style(css_class)
end