Module: Parse::Retrieval::Chunker

Defined in:: lib/parse/retrieval/chunker.rb

Overview

Pluggable text-chunking strategies for the retrieval layer.

A chunker splits a source document's text into smaller, overlapping windows for presentation. retrieve fetches the top-k whole records via Atlas $vectorSearch, then runs each record's text field through a chunker so callers get focused, citable passages rather than whole documents.

== Presentation chunking, not embedding chunking

Embedding remains one-vector-per-record (see Core::EmbedManaged). Chunking here is purely a presentation step applied after retrieval: every chunk produced from a document inherits that document's single vector-search score. The chunker never calls an embedding provider.

== Extending

FixedSizeOverlap is the default and the only strategy shipped. Subclass Base for semantic, sentence-aware, or true token-aware chunking:

class SentenceChunker < Parse::Retrieval::Chunker::Base def chunk(text) normalize(text).split(/(?<=[.!?])\s+/) end end

Parse::Retrieval.retrieve( query: "onboarding steps", klass: KnowledgeArticle, chunker: SentenceChunker.new, )

Defined Under Namespace

Classes: Base, FixedSizeOverlap