Class: Phronomy::Splitter::FixedSizeSplitter
- Defined in:
- lib/phronomy/splitter/fixed_size_splitter.rb
Overview
Splits text into fixed-size character chunks with optional overlap.
Instance Method Summary collapse
-
#initialize(chunk_size: 1000, chunk_overlap: 200) ⇒ FixedSizeSplitter
constructor
A new instance of FixedSizeSplitter.
- #split(document) ⇒ Array<Hash>
Methods inherited from Base
Constructor Details
#initialize(chunk_size: 1000, chunk_overlap: 200) ⇒ FixedSizeSplitter
Returns a new instance of FixedSizeSplitter.
18 19 20 21 22 23 |
# File 'lib/phronomy/splitter/fixed_size_splitter.rb', line 18 def initialize(chunk_size: 1000, chunk_overlap: 200) raise ArgumentError, "chunk_overlap must be less than chunk_size" if chunk_overlap >= chunk_size @chunk_size = chunk_size @chunk_overlap = chunk_overlap end |
Instance Method Details
#split(document) ⇒ Array<Hash>
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/phronomy/splitter/fixed_size_splitter.rb', line 27 def split(document) doc = normalise(document) text = doc[:text] = doc[:metadata] chunks = [] start = 0 index = 0 while start < text.length chunk_text = text[start, @chunk_size] chunks << {text: chunk_text, metadata: .merge(chunk: index)} break if start + @chunk_size >= text.length start += @chunk_size - @chunk_overlap index += 1 end chunks end |