Module: Parse::Retrieval
- Defined in:
- lib/parse/retrieval/retriever.rb,
lib/parse/retrieval/chunk.rb,
lib/parse/retrieval/chunker.rb,
lib/parse/retrieval/agent_tool.rb
Overview
Retrieval-augmented-generation (RAG) helpers. Parse::RAG is a
discoverability alias for this module.
Retrieval.retrieve is the agent-agnostic core: it embeds a natural-language
query, runs Atlas $vectorSearch through the existing
Class.find_similar (which enforces ACL/CLP mongo-direct), then
splits each retrieved document's text field into scored
Chunks for presentation.
The agent-facing semantic_search tool (see
lib/parse/retrieval/agent_tool.rb) wraps Retrieval.retrieve with the
agent security envelope (tenant scope, field_allowlist projection,
score quantization).
== ACL model
Retrieval.retrieve does NOT implement a REST "two-stage" re-query. The
vector path is mongo-direct only (Parse Server's REST /aggregate
is master-key-only and bypasses ACL — see the project notes), and
acl_user: / acl_role: scopes have no REST equivalent. ACL is
enforced inside find_similar via a post-$vectorSearch _rperm
$match. Scope kwargs (session_token: / acl_user: /
acl_role: / master:) pass straight through **scope_opts.
Defined Under Namespace
Modules: AgentTool, Chunker Classes: AmbiguousTextField, Chunk, TenantScopeConflict
Class Method Summary collapse
-
.assert_no_underscore_keys!(obj, path = []) ⇒ Object
Recursively refuse any underscore-prefixed key, at any depth, in a caller-supplied filter.
-
.retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, **scope_opts) ⇒ Array<Parse::Retrieval::Chunk>
Retrieve and chunk documents semantically similar to
query.
Class Method Details
.assert_no_underscore_keys!(obj, path = []) ⇒ Object
Recursively refuse any underscore-prefixed key, at any depth, in a
caller-supplied filter. This is distinct from (and stricter than)
the agent layer's flat validate_keys!: a Mongo-style filter is a
nested structure, and an underscore key buried inside $or /
$elemMatch / a hash value could clobber tenant scope or reach a
reserved column (_rperm, _p_*, _auth_data_*). The walk is
unconditional — it does not special-case operators.
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/parse/retrieval/retriever.rb', line 55 def assert_no_underscore_keys!(obj, path = []) case obj when Hash obj.each do |k, v| ks = k.to_s if ks.start_with?("_") raise ArgumentError, "filter key '#{(path + [ks]).join(".")}' is reserved (underscore-prefixed)." end assert_no_underscore_keys!(v, path + [ks]) end when Array obj.each_with_index { |v, i| assert_no_underscore_keys!(v, path + ["[#{i}]"]) } end obj end |
.retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, **scope_opts) ⇒ Array<Parse::Retrieval::Chunk>
Retrieve and chunk documents semantically similar to query.
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
# File 'lib/parse/retrieval/retriever.rb', line 111 def retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, **scope_opts) raise NotImplementedError, "Parse::Retrieval.retrieve: `hybrid:` is reserved for a future release." if hybrid raise NotImplementedError, "Parse::Retrieval.retrieve: `rerank:` is reserved for a future release." if rerank # `class:` alias (reserved word — arrives via **scope_opts). klass ||= scope_opts.delete(:class) klass = resolve_class!(klass) unless query.is_a?(String) && !query.strip.empty? raise ArgumentError, "Parse::Retrieval.retrieve: `query:` must be a non-empty String." end resolved_text_field = (text_field || infer_text_field!(klass)).to_sym merged_vector_filter = fold_tenant_scope(klass, vector_filter, tenant_scope) chunker ||= default_chunker raw_hits = klass.find_similar( text: query, k: k, field: field, filter: filter, vector_filter: merged_vector_filter, raw: true, **scope_opts, ) return [] if raw_hits.nil? || raw_hits.empty? text_wire = wire_name(klass, resolved_text_field) raw_hits.flat_map do |doc| build_chunks_for(doc, klass, text_wire, score_quantize, source_transform, chunker) end end |