Module: Parse::Retrieval
- Defined in:
- lib/parse/retrieval/retriever.rb,
lib/parse/retrieval/chunk.rb,
lib/parse/retrieval/chunker.rb,
lib/parse/retrieval/reranker.rb,
lib/parse/retrieval/agent_tool.rb,
lib/parse/retrieval/reranker/cohere.rb
Overview
Retrieval-augmented-generation (RAG) helpers. Parse::RAG is a
discoverability alias for this module.
Retrieval.retrieve is the agent-agnostic core: it embeds a natural-language
query, runs Atlas $vectorSearch through the existing
Class.find_similar (which enforces ACL/CLP mongo-direct), then
splits each retrieved document's text field into scored
Chunks for presentation.
The agent-facing semantic_search tool (see
lib/parse/retrieval/agent_tool.rb) wraps Retrieval.retrieve with the
agent security envelope (tenant scope, field_allowlist projection,
score quantization).
== ACL model
Retrieval.retrieve does NOT implement a REST "two-stage" re-query. The
vector path is mongo-direct only (Parse Server's REST /aggregate
is master-key-only and bypasses ACL — see the project notes), and
acl_user: / acl_role: scopes have no REST equivalent. ACL is
enforced inside find_similar via a post-$vectorSearch _rperm
$match. Scope kwargs (session_token: / acl_user: /
acl_role: / master:) pass straight through **scope_opts.
Defined Under Namespace
Modules: AgentTool, Chunker, Reranker Classes: AmbiguousTextField, Chunk, TenantScopeConflict
Class Method Summary collapse
-
.assert_no_underscore_keys!(obj, path = []) ⇒ Object
Recursively refuse any underscore-prefixed key, at any depth, in a caller-supplied filter.
-
.retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, rerank_top_n: nil, **scope_opts) ⇒ Array<Parse::Retrieval::Chunk>
Retrieve and chunk documents semantically similar to
query. -
.translate_pointer_filter_values(klass, filter) ⇒ Hash?
Translate Parse pointer VALUES in a caller-supplied filter into their MongoDB storage form so they actually match raw documents.
Class Method Details
.assert_no_underscore_keys!(obj, path = []) ⇒ Object
Recursively refuse any underscore-prefixed key, at any depth, in a
caller-supplied filter. This is distinct from (and stricter than)
the agent layer's flat validate_keys!: a Mongo-style filter is a
nested structure, and an underscore key buried inside $or /
$elemMatch / a hash value could clobber tenant scope or reach a
reserved column (_rperm, _p_*, _auth_data_*). The walk is
unconditional — it does not special-case operators.
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/parse/retrieval/retriever.rb', line 56 def assert_no_underscore_keys!(obj, path = []) case obj when Hash obj.each do |k, v| ks = k.to_s if ks.start_with?("_") raise ArgumentError, "filter key '#{(path + [ks]).join(".")}' is reserved (underscore-prefixed)." end assert_no_underscore_keys!(v, path + [ks]) end when Array obj.each_with_index { |v, i| assert_no_underscore_keys!(v, path + ["[#{i}]"]) } end obj end |
.retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, rerank_top_n: nil, **scope_opts) ⇒ Array<Parse::Retrieval::Chunk>
Retrieve and chunk documents semantically similar to query.
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
# File 'lib/parse/retrieval/retriever.rb', line 197 def retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, rerank_top_n: nil, **scope_opts) if rerank && !rerank.respond_to?(:rerank) raise ArgumentError, "Parse::Retrieval.retrieve: `rerank:` must respond to #rerank " \ "(a Parse::Retrieval::Reranker::Base); got #{rerank.class}." end # `class:` alias (reserved word — arrives via **scope_opts). klass ||= scope_opts.delete(:class) klass = resolve_class!(klass) unless query.is_a?(String) && !query.strip.empty? raise ArgumentError, "Parse::Retrieval.retrieve: `query:` must be a non-empty String." end resolved_text_field = (text_field || infer_text_field!(klass)).to_sym # Pointer-value translation runs BEFORE the tenant-scope fold (the # fold's conflict check must see final storage-form keys) and after # any caller-side underscore-key gate (the agent tool walks the raw # filter before calling retrieve). filter = translate_pointer_filter_values(klass, filter) vector_filter = translate_pointer_filter_values(klass, vector_filter) merged_vector_filter = fold_tenant_scope(klass, vector_filter, tenant_scope) chunker ||= default_chunker text_wire = wire_name(klass, resolved_text_field) raw_hits = if hybrid fetch_hybrid_hits(klass, query, k, field, filter, merged_vector_filter, tenant_scope, hybrid, scope_opts) else klass.find_similar( text: query, k: k, field: field, filter: filter, vector_filter: merged_vector_filter, raw: true, **scope_opts, ) end return [] if raw_hits.nil? || raw_hits.empty? raw_hits = apply_rerank(rerank, query, raw_hits, text_wire, rerank_top_n) if rerank raw_hits.flat_map do |doc| build_chunks_for(doc, klass, text_wire, score_quantize, source_transform, chunker) end end |
.translate_pointer_filter_values(klass, filter) ⇒ Hash?
Translate Parse pointer VALUES in a caller-supplied filter into
their MongoDB storage form so they actually match raw documents.
{ owner: <Parse::Pointer User/abc> } becomes
{ "_p_owner" => "_User$abc" } — pointer columns are stored under
a _p_ prefix with "<className>$<objectId>" string values, so a
Parse-side pointer (a {__type: "Pointer", ...} hash on the wire,
or a Parse::Pointer / Parse::Object instance from Ruby
callers) in a $match / $vectorSearch.filter would otherwise
never match anything.
Recognized pointer values:
Parse::Pointer/Parse::Objectinstances,{ "__type" => "Pointer", "className" => ..., "objectId" => ... }hashes (symbol or string keys).
Translation applies to direct values, and to pointer values inside
one level of operator hashes ({ owner: { "$in" => [ptr, ptr] } },
$eq / $ne / $nin). Non-pointer values and unrecognized keys
pass through untouched, so the call is idempotent.
SECURITY ORDERING: run this AFTER assert_no_underscore_keys! /
the agent filter-field allowlist (callers may not name _p_*
columns directly) and BEFORE the tenant-scope fold.
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/parse/retrieval/retriever.rb', line 102 def translate_pointer_filter_values(klass, filter) return filter unless filter.is_a?(Hash) out = {} filter.each do |key, value| if (storage = pointer_storage_value(value)) out["_p_#{wire_name(klass, key)}"] = storage elsif value.is_a?(Hash) && value.keys.any? { |op| op.to_s.start_with?("$") } translated = value.transform_values do |opval| if (s = pointer_storage_value(opval)) s elsif opval.is_a?(Array) opval.map { |el| pointer_storage_value(el) || el } else opval end end if translated == value out[key] = value else out["_p_#{wire_name(klass, key)}"] = translated end else out[key] = value end end out end |