Class: Phronomy::Loader::MarkdownLoader

Inherits:
Base
  • Object
show all
Defined in:
lib/phronomy/loader/markdown_loader.rb

Overview

Loads a Markdown file, optionally splitting on top-level headings.

When +split_on_headings:+ is true (the default), each H1/H2 section becomes a separate document so that embeddings capture section semantics rather than the full file at once.

Examples:

Single document (heading split disabled)

loader = Phronomy::Loader::MarkdownLoader.new(split_on_headings: false)
docs   = loader.load("README.md")
# => [{ text: "# Title\n...", metadata: { source: "README.md" } }]

Split per heading (default)

loader = Phronomy::Loader::MarkdownLoader.new
docs   = loader.load("guide.md")
# => [
#   { text: "# Section 1\n...", metadata: { source: "guide.md", section: "Section 1" } },
#   { text: "## Sub-section\n...", metadata: { source: "guide.md", section: "Sub-section" } },
# ]

Constant Summary collapse

HEADING_RE =
/^(\#{1,6})\s+(.+)$/

Instance Method Summary collapse

Constructor Details

#initialize(split_on_headings: true) ⇒ MarkdownLoader

Returns a new instance of MarkdownLoader.

Parameters:

  • split_on_headings (Boolean) (defaults to: true)

    split on H1–H6 boundaries (default: true)



27
28
29
# File 'lib/phronomy/loader/markdown_loader.rb', line 27

def initialize(split_on_headings: true)
  @split_on_headings = split_on_headings
end

Instance Method Details

#load(source) ⇒ Array<Hash>

Parameters:

  • source (String)

    path to a Markdown file

Returns:

  • (Array<Hash>)

Raises:

  • (Errno::ENOENT)

    if the file does not exist



34
35
36
37
38
39
# File 'lib/phronomy/loader/markdown_loader.rb', line 34

def load(source)
  content = File.read(source, encoding: "UTF-8")
  return [{text: content, metadata: {source: source}}] unless @split_on_headings

  split_by_headings(content, source)
end