Class: HTM::Loaders::MarkdownChunker
- Inherits:
-
Object
- Object
- HTM::Loaders::MarkdownChunker
- Defined in:
- lib/htm/loaders/markdown_chunker.rb
Overview
Markdown-aware text chunker using Baran
Wraps Baran::MarkdownSplitter to provide intelligent text chunking that respects markdown structure (headers, code blocks, etc.).
Instance Attribute Summary collapse
-
#chunk_overlap ⇒ Object
readonly
Returns the value of attribute chunk_overlap.
-
#chunk_size ⇒ Object
readonly
Returns the value of attribute chunk_size.
Instance Method Summary collapse
-
#chunk(text) ⇒ Array<String>
Split text into markdown-aware chunks (text only).
-
#chunk_with_metadata(text) ⇒ Array<Hash>
Split text and return full chunk data (with cursor positions).
-
#initialize(chunk_size: nil, chunk_overlap: nil) ⇒ MarkdownChunker
constructor
A new instance of MarkdownChunker.
Constructor Details
#initialize(chunk_size: nil, chunk_overlap: nil) ⇒ MarkdownChunker
Returns a new instance of MarkdownChunker.
29 30 31 32 33 34 35 36 37 |
# File 'lib/htm/loaders/markdown_chunker.rb', line 29 def initialize(chunk_size: nil, chunk_overlap: nil) @chunk_size = chunk_size || HTM.configuration.chunk_size @chunk_overlap = chunk_overlap || HTM.configuration.chunk_overlap @splitter = Baran::MarkdownSplitter.new( chunk_size: @chunk_size, chunk_overlap: @chunk_overlap ) end |
Instance Attribute Details
#chunk_overlap ⇒ Object (readonly)
Returns the value of attribute chunk_overlap.
76 77 78 |
# File 'lib/htm/loaders/markdown_chunker.rb', line 76 def chunk_overlap @chunk_overlap end |
#chunk_size ⇒ Object (readonly)
Returns the value of attribute chunk_size.
76 77 78 |
# File 'lib/htm/loaders/markdown_chunker.rb', line 76 def chunk_size @chunk_size end |
Instance Method Details
#chunk(text) ⇒ Array<String>
Split text into markdown-aware chunks (text only)
44 45 46 47 48 49 50 51 52 53 54 55 |
# File 'lib/htm/loaders/markdown_chunker.rb', line 44 def chunk(text) return [] if text.nil? || text.strip.empty? # Normalize line endings normalized = text.gsub(/\r\n?/, "\n") # Use Baran's MarkdownSplitter result = @splitter.chunks(normalized) # Extract text from chunk hashes, filter empty result.map { |chunk| chunk[:text].strip }.reject(&:empty?) end |
#chunk_with_metadata(text) ⇒ Array<Hash>
Split text and return full chunk data (with cursor positions)
Returns Baran’s full output including:
-
:text [String] The chunk content
-
:cursor [Integer] Character offset where chunk starts in original text
66 67 68 69 70 71 72 73 74 |
# File 'lib/htm/loaders/markdown_chunker.rb', line 66 def (text) return [] if text.nil? || text.strip.empty? # Normalize line endings normalized = text.gsub(/\r\n?/, "\n") # Use Baran's MarkdownSplitter - returns [{text:, cursor:}, ...] @splitter.chunks(normalized) end |