Class: Parse::Retrieval::Chunker::Base
- Inherits:
-
Object
- Object
- Parse::Retrieval::Chunker::Base
- Defined in:
- lib/parse/retrieval/chunker.rb
Overview
Abstract base. Subclasses MUST implement #chunk.
Subclasses get one free behavior from Base: #chunk_with_meta, which wraps #chunk and reports whether the result was capped. Parse::Retrieval.retrieve calls #chunk_with_meta so it can stamp a truncation signal onto each emitted chunk's metadata.
Direct Known Subclasses
Instance Method Summary collapse
-
#chunk(text) ⇒ Array<String>
Zero or more chunks.
-
#chunk_with_meta(text) ⇒ Hash
Wrap #chunk with truncation metadata.
Instance Method Details
#chunk(text) ⇒ Array<String>
Returns zero or more chunks. MUST return []
for blank/nil input.
51 52 53 |
# File 'lib/parse/retrieval/chunker.rb', line 51 def chunk(text) raise NotImplementedError, "#{self.class}#chunk must return Array<String>." end |
#chunk_with_meta(text) ⇒ Hash
Wrap #chunk with truncation metadata. The default
implementation here does NOT cap — it reports the chunk list as
produced. FixedSizeOverlap overrides this to enforce its
max_chunks_per_document cap and report the pre-cap count.
63 64 65 66 |
# File 'lib/parse/retrieval/chunker.rb', line 63 def (text) chunks = Array(chunk(text)) { chunks: chunks, truncated: false, total_before_truncation: chunks.length } end |