Module: Parse::Retrieval::Chunker
- Defined in:
- lib/parse/retrieval/chunker.rb
Overview
Pluggable text-chunking strategies for the retrieval layer.
A chunker splits a source document's text into smaller, overlapping
windows for presentation. retrieve fetches the
top-k whole records via Atlas $vectorSearch, then runs each
record's text field through a chunker so callers get focused,
citable passages rather than whole documents.
== Presentation chunking, not embedding chunking
Embedding remains one-vector-per-record (see Core::EmbedManaged). Chunking here is purely a presentation step applied after retrieval: every chunk produced from a document inherits that document's single vector-search score. The chunker never calls an embedding provider.
== Extending
FixedSizeOverlap is the default and the only strategy shipped. Subclass Base for semantic, sentence-aware, or true token-aware chunking:
class SentenceChunker < Parse::Retrieval::Chunker::Base def chunk(text) normalize(text).split(/(?<=[.!?])\s+/) end end
Parse::Retrieval.retrieve( query: "onboarding steps", klass: KnowledgeArticle, chunker: SentenceChunker.new, )
Defined Under Namespace
Classes: Base, FixedSizeOverlap