Class: LlmDocsBuilder::MarkdownTransformer

Inherits:
Object
  • Object
show all
Defined in:
lib/llm_docs_builder/markdown_transformer.rb

Overview

Transforms markdown files to be AI-friendly

Orchestrates a pipeline of specialized transformers to process markdown content. Each transformer is responsible for a specific aspect of the transformation.

Examples:

Transform with base URL

transformer = LlmDocsBuilder::MarkdownTransformer.new('README.md',
  base_url: 'https://myproject.io'
)
content = transformer.transform

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(file_path, options = {}) ⇒ MarkdownTransformer

Initialize a new markdown transformer

Parameters:

  • file_path (String)

    path to markdown file to transform

  • options (Hash) (defaults to: {})

    transformation options

Options Hash (options):

  • :base_url (String)

    base URL for expanding relative links

  • :convert_urls (Boolean)

    convert HTML URLs to markdown format

  • :remove_comments (Boolean)

    remove HTML comments from markdown

  • :normalize_whitespace (Boolean)

    normalize excessive whitespace

  • :remove_badges (Boolean)

    remove badge/shield images

  • :remove_frontmatter (Boolean)

    remove YAML/TOML frontmatter

  • :remove_code_examples (Boolean)

    remove code blocks and inline code

  • :remove_images (Boolean)

    remove image syntax

  • :simplify_links (Boolean)

    shorten verbose link text

  • :remove_blockquotes (Boolean)

    remove blockquote formatting

  • :generate_toc (Boolean)

    generate table of contents at the top

  • :custom_instruction (String)

    custom instruction text to inject at top

  • :remove_stopwords (Boolean)

    remove common stopwords (aggressive)

  • :remove_duplicates (Boolean)

    remove duplicate paragraphs



41
42
43
44
# File 'lib/llm_docs_builder/markdown_transformer.rb', line 41

def initialize(file_path, options = {})
  @file_path = file_path
  @options = options
end

Instance Attribute Details

#file_pathString (readonly)

Returns path to markdown file.

Returns:

  • (String)

    path to markdown file



18
19
20
# File 'lib/llm_docs_builder/markdown_transformer.rb', line 18

def file_path
  @file_path
end

#optionsHash (readonly)

Returns transformation options.

Returns:

  • (Hash)

    transformation options



21
22
23
# File 'lib/llm_docs_builder/markdown_transformer.rb', line 21

def options
  @options
end

Instance Method Details

#transformString

Transform markdown content using a pipeline of transformers

Processes content through specialized transformers in order:

  1. ContentCleanupTransformer - Removes unwanted elements

  2. LinkTransformer - Processes links

  3. HeadingTransformer - Normalizes heading hierarchy (if enabled)

  4. TextCompressor - Advanced compression (if enabled)

  5. EnhancementTransformer - Adds TOC and instructions

  6. WhitespaceTransformer - Normalizes whitespace

Returns:

  • (String)

    transformed markdown content



57
58
59
60
61
62
63
64
65
66
67
# File 'lib/llm_docs_builder/markdown_transformer.rb', line 57

def transform
  content = load_content

  # Build and execute transformation pipeline
  content = cleanup_transformer.transform(content, options)
  content = link_transformer.transform(content, options)
  content = heading_transformer.transform(content, options)
  content = compress_content(content) if should_compress?
  content = enhancement_transformer.transform(content, options)
  whitespace_transformer.transform(content, options)
end