Class: LlmDocsBuilder::HtmlToMarkdownConverter
- Inherits:
-
Object
- Object
- LlmDocsBuilder::HtmlToMarkdownConverter
- Defined in:
- lib/llm_docs_builder/html_to_markdown_converter.rb
Overview
A lightweight HTML → Markdown converter using only Nokogiri’s public API.
Design goals:
-
Traverse with Nokogiri and keep logic small, readable, and predictable
-
Preserve the existing public behavior covered by specs
-
Convert tables into Markdown while preserving inline formatting
Constant Summary collapse
- HEADING_LEVEL =
Mapping of HTML heading tags to their numeric levels
{ 'h1' => 1, 'h2' => 2, 'h3' => 3, 'h4' => 4, 'h5' => 5, 'h6' => 6 }.freeze
- BLOCK_CONTAINERS =
HTML tags treated as transparent block containers
%w[div aside figure article section main header footer nav body html].freeze
- INLINE_STRONG_TAGS =
HTML tags rendered as bold/strong in markdown
%w[strong b].freeze
- INLINE_EM_TAGS =
HTML tags rendered as italic/emphasis in markdown
%w[em i].freeze
- LIST_TAGS =
HTML list container tags
%w[ul ol].freeze
- IGNORE_TAGS =
HTML tags that should be completely ignored during conversion
%w[script style head noscript iframe svg canvas].freeze
- MARKDOWN_LABEL_ESCAPE_PATTERN =
Pattern for escaping markdown special characters in link labels
/[\\\[\]()*_`!]/- SAFE_URI_SCHEMES =
URL schemes considered safe for link destinations
%w[http https mailto ftp tel].freeze
Instance Method Summary collapse
-
#convert(html) ⇒ String
Entry point for HTML to Markdown conversion.
-
#table_renderer ⇒ Object
Initialize table renderer.
Instance Method Details
#convert(html) ⇒ String
Entry point for HTML to Markdown conversion
46 47 48 49 50 51 52 |
# File 'lib/llm_docs_builder/html_to_markdown_converter.rb', line 46 def convert(html) return '' if html.nil? || html.strip.empty? fragment = Nokogiri::HTML::DocumentFragment.parse(html) rendered = render_blocks(fragment.children, depth: 0) clean_output(rendered) end |
#table_renderer ⇒ Object
Initialize table renderer
55 56 57 58 59 60 |
# File 'lib/llm_docs_builder/html_to_markdown_converter.rb', line 55 def table_renderer @table_renderer ||= HtmlToMarkdown::TableMarkupRenderer.new( inline_collapser: method(:collapsed_inline_for), block_renderer: method(:render_blocks) ) end |