Class: LlmDocsBuilder::HtmlToMarkdownConverter

Inherits:
Object
  • Object
show all
Defined in:
lib/llm_docs_builder/html_to_markdown_converter.rb

Overview

A lightweight HTML → Markdown converter using only Nokogiri’s public API.

Design goals:

  • Traverse with Nokogiri and keep logic small, readable, and predictable

  • Preserve the existing public behavior covered by specs

  • Convert tables into Markdown while preserving inline formatting

Constant Summary collapse

HEADING_LEVEL =

Mapping of HTML heading tags to their numeric levels

{
  'h1' => 1,
  'h2' => 2,
  'h3' => 3,
  'h4' => 4,
  'h5' => 5,
  'h6' => 6
}.freeze
BLOCK_CONTAINERS =

HTML tags treated as transparent block containers

%w[div aside figure article section main header footer nav body html].freeze
INLINE_STRONG_TAGS =

HTML tags rendered as bold/strong in markdown

%w[strong b].freeze
INLINE_EM_TAGS =

HTML tags rendered as italic/emphasis in markdown

%w[em i].freeze
LIST_TAGS =

HTML list container tags

%w[ul ol].freeze
IGNORE_TAGS =

HTML tags that should be completely ignored during conversion

%w[script style head noscript iframe svg canvas].freeze
MARKDOWN_LABEL_ESCAPE_PATTERN =

Pattern for escaping markdown special characters in link labels

/[\\\[\]()*_`!]/
SAFE_URI_SCHEMES =

URL schemes considered safe for link destinations

%w[http https mailto ftp tel].freeze

Instance Method Summary collapse

Instance Method Details

#convert(html) ⇒ String

Entry point for HTML to Markdown conversion

Parameters:

  • html (String)

    HTML content to convert

Returns:

  • (String)

    converted markdown content



46
47
48
49
50
51
52
# File 'lib/llm_docs_builder/html_to_markdown_converter.rb', line 46

def convert(html)
  return '' if html.nil? || html.strip.empty?

  fragment = Nokogiri::HTML::DocumentFragment.parse(html)
  rendered = render_blocks(fragment.children, depth: 0)
  clean_output(rendered)
end

#table_rendererObject

Initialize table renderer



55
56
57
58
59
60
# File 'lib/llm_docs_builder/html_to_markdown_converter.rb', line 55

def table_renderer
  @table_renderer ||= HtmlToMarkdown::TableMarkupRenderer.new(
    inline_collapser: method(:collapsed_inline_for),
    block_renderer: method(:render_blocks)
  )
end