Module: Parse::Retrieval
- Defined in:
- lib/parse/retrieval/retriever.rb,
lib/parse/retrieval/chunk.rb,
lib/parse/retrieval/chunker.rb,
lib/parse/retrieval/reranker.rb,
lib/parse/retrieval/agent_tool.rb,
lib/parse/retrieval/reranker/cohere.rb
Overview
Retrieval-augmented-generation (RAG) helpers. Parse::RAG is a
discoverability alias for this module.
Retrieval.retrieve is the agent-agnostic core: it embeds a natural-language
query, runs Atlas $vectorSearch through the existing
Class.find_similar (which enforces ACL/CLP mongo-direct), then
splits each retrieved document's text field into scored
Chunks for presentation.
The agent-facing semantic_search tool (see
lib/parse/retrieval/agent_tool.rb) wraps Retrieval.retrieve with the
agent security envelope (tenant scope, field_allowlist projection,
score quantization).
== ACL model
Retrieval.retrieve does NOT implement a REST "two-stage" re-query. The
vector path is mongo-direct only (Parse Server's REST /aggregate
is master-key-only and bypasses ACL — see the project notes), and
acl_user: / acl_role: scopes have no REST equivalent. ACL is
enforced inside find_similar via a post-$vectorSearch _rperm
$match. Scope kwargs (session_token: / acl_user: /
acl_role: / master:) pass straight through **scope_opts.
Defined Under Namespace
Modules: AgentTool, Chunker, Reranker Classes: AmbiguousTextField, Chunk, TenantScopeConflict
Class Method Summary collapse
-
.assert_no_underscore_keys!(obj, path = []) ⇒ Object
Recursively refuse any underscore-prefixed key, at any depth, in a caller-supplied filter.
-
.retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, rerank_top_n: nil, **scope_opts) ⇒ Array<Parse::Retrieval::Chunk>
Retrieve and chunk documents semantically similar to
query.
Class Method Details
.assert_no_underscore_keys!(obj, path = []) ⇒ Object
Recursively refuse any underscore-prefixed key, at any depth, in a
caller-supplied filter. This is distinct from (and stricter than)
the agent layer's flat validate_keys!: a Mongo-style filter is a
nested structure, and an underscore key buried inside $or /
$elemMatch / a hash value could clobber tenant scope or reach a
reserved column (_rperm, _p_*, _auth_data_*). The walk is
unconditional — it does not special-case operators.
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/parse/retrieval/retriever.rb', line 56 def assert_no_underscore_keys!(obj, path = []) case obj when Hash obj.each do |k, v| ks = k.to_s if ks.start_with?("_") raise ArgumentError, "filter key '#{(path + [ks]).join(".")}' is reserved (underscore-prefixed)." end assert_no_underscore_keys!(v, path + [ks]) end when Array obj.each_with_index { |v, i| assert_no_underscore_keys!(v, path + ["[#{i}]"]) } end obj end |
.retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, rerank_top_n: nil, **scope_opts) ⇒ Array<Parse::Retrieval::Chunk>
Retrieve and chunk documents semantically similar to query.
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
# File 'lib/parse/retrieval/retriever.rb', line 119 def retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, rerank_top_n: nil, **scope_opts) if rerank && !rerank.respond_to?(:rerank) raise ArgumentError, "Parse::Retrieval.retrieve: `rerank:` must respond to #rerank " \ "(a Parse::Retrieval::Reranker::Base); got #{rerank.class}." end # `class:` alias (reserved word — arrives via **scope_opts). klass ||= scope_opts.delete(:class) klass = resolve_class!(klass) unless query.is_a?(String) && !query.strip.empty? raise ArgumentError, "Parse::Retrieval.retrieve: `query:` must be a non-empty String." end resolved_text_field = (text_field || infer_text_field!(klass)).to_sym merged_vector_filter = fold_tenant_scope(klass, vector_filter, tenant_scope) chunker ||= default_chunker text_wire = wire_name(klass, resolved_text_field) raw_hits = if hybrid fetch_hybrid_hits(klass, query, k, field, filter, merged_vector_filter, tenant_scope, hybrid, scope_opts) else klass.find_similar( text: query, k: k, field: field, filter: filter, vector_filter: merged_vector_filter, raw: true, **scope_opts, ) end return [] if raw_hits.nil? || raw_hits.empty? raw_hits = apply_rerank(rerank, query, raw_hits, text_wire, rerank_top_n) if rerank raw_hits.flat_map do |doc| build_chunks_for(doc, klass, text_wire, score_quantize, source_transform, chunker) end end |