Class: HTM::Loaders::MarkdownLoader
- Inherits:
-
Object
- Object
- HTM::Loaders::MarkdownLoader
- Defined in:
- lib/htm/loaders/markdown_loader.rb
Overview
Markdown file loader
Loads markdown files into HTM long-term memory with support for:
-
YAML frontmatter parsing (stored as metadata on first chunk)
-
Paragraph-based chunking
-
Re-sync on file changes (via mtime comparison)
-
Duplicate detection via content_hash
Constant Summary collapse
- FRONTMATTER_REGEX =
/\A---\s*\n(.*?)\n---\s*\n/m- MAX_FILE_SIZE =
10 MB maximum file size
10 * 1024 * 1024
Instance Method Summary collapse
-
#initialize(htm_instance, chunk_size: nil, chunk_overlap: nil) ⇒ MarkdownLoader
constructor
A new instance of MarkdownLoader.
-
#load_directory(path, pattern: '**/*.md', force: false) ⇒ Array<Hash>
Load all matching files from a directory.
-
#load_file(path, force: false) ⇒ Hash
Load a single markdown file into long-term memory.
Constructor Details
#initialize(htm_instance, chunk_size: nil, chunk_overlap: nil) ⇒ MarkdownLoader
Returns a new instance of MarkdownLoader.
31 32 33 34 35 36 37 |
# File 'lib/htm/loaders/markdown_loader.rb', line 31 def initialize(htm_instance, chunk_size: nil, chunk_overlap: nil) @htm = htm_instance @chunker = MarkdownChunker.new( chunk_size: chunk_size, chunk_overlap: chunk_overlap ) end |
Instance Method Details
#load_directory(path, pattern: '**/*.md', force: false) ⇒ Array<Hash>
Load all matching files from a directory
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/htm/loaders/markdown_loader.rb', line 82 def load_directory(path, pattern: '**/*.md', force: false) = File.(path) unless File.exist?() raise ArgumentError, "Directory not found: #{path}" end unless File.directory?() raise ArgumentError, "Not a directory: #{path}" end files = Dir.glob(File.join(, pattern)) files.map do |file_path| load_file(file_path, force: force) rescue StandardError => e { file_path: file_path, error: e., skipped: false } end end |
#load_file(path, force: false) ⇒ Hash
Load a single markdown file into long-term memory
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/htm/loaders/markdown_loader.rb', line 50 def load_file(path, force: false) = validate_file_path!(path) content = read_file_content(, path) stat = File.stat() file_hash = Digest::SHA256.hexdigest(content) source = HTM::Models::FileSource.first(file_path: ) is_new = source.nil? source ||= HTM::Models::FileSource.new(file_path: ) unless force || is_new || source.needs_sync?(stat.mtime) return { file_path: , chunks_created: 0, chunks_updated: 0, chunks_deleted: 0, skipped: true } end frontmatter, body = extract_frontmatter(content) chunks = @chunker.(body) prepend_frontmatter_to_chunk(frontmatter, chunks) source.save if is_new result = sync_chunks(source, chunks) source.update(file_hash: file_hash, mtime: stat.mtime, file_size: stat.size, frontmatter: frontmatter, last_synced_at: Time.now) result.merge(file_path: , file_source_id: source.id, skipped: false) end |