Module: Parse::Retrieval

Defined in:
lib/parse/retrieval/retriever.rb,
lib/parse/retrieval/chunk.rb,
lib/parse/retrieval/chunker.rb,
lib/parse/retrieval/reranker.rb,
lib/parse/retrieval/agent_tool.rb,
lib/parse/retrieval/reranker/cohere.rb

Overview

Retrieval-augmented-generation (RAG) helpers. Parse::RAG is a discoverability alias for this module.

Retrieval.retrieve is the agent-agnostic core: it embeds a natural-language query, runs Atlas $vectorSearch through the existing Class.find_similar (which enforces ACL/CLP mongo-direct), then splits each retrieved document's text field into scored Chunks for presentation.

The agent-facing semantic_search tool (see lib/parse/retrieval/agent_tool.rb) wraps Retrieval.retrieve with the agent security envelope (tenant scope, field_allowlist projection, score quantization).

== ACL model

Retrieval.retrieve does NOT implement a REST "two-stage" re-query. The vector path is mongo-direct only (Parse Server's REST /aggregate is master-key-only and bypasses ACL — see the project notes), and acl_user: / acl_role: scopes have no REST equivalent. ACL is enforced inside find_similar via a post-$vectorSearch _rperm $match. Scope kwargs (session_token: / acl_user: / acl_role: / master:) pass straight through **scope_opts.

Defined Under Namespace

Modules: AgentTool, Chunker, Reranker Classes: AmbiguousTextField, Chunk, TenantScopeConflict

Class Method Summary collapse

Class Method Details

.assert_no_underscore_keys!(obj, path = []) ⇒ Object

Recursively refuse any underscore-prefixed key, at any depth, in a caller-supplied filter. This is distinct from (and stricter than) the agent layer's flat validate_keys!: a Mongo-style filter is a nested structure, and an underscore key buried inside $or / $elemMatch / a hash value could clobber tenant scope or reach a reserved column (_rperm, _p_*, _auth_data_*). The walk is unconditional — it does not special-case operators.

Parameters:

  • obj (Object)

    a filter Hash/Array (or anything; scalars pass).

  • path (Array<String>) (defaults to: [])

    internal — accumulates the key path for the error message.

Raises:

  • (ArgumentError)

    on any _-prefixed key.



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/parse/retrieval/retriever.rb', line 56

def assert_no_underscore_keys!(obj, path = [])
  case obj
  when Hash
    obj.each do |k, v|
      ks = k.to_s
      if ks.start_with?("_")
        raise ArgumentError,
              "filter key '#{(path + [ks]).join(".")}' is reserved (underscore-prefixed)."
      end
      assert_no_underscore_keys!(v, path + [ks])
    end
  when Array
    obj.each_with_index { |v, i| assert_no_underscore_keys!(v, path + ["[#{i}]"]) }
  end
  obj
end

.retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, rerank_top_n: nil, **scope_opts) ⇒ Array<Parse::Retrieval::Chunk>

Retrieve and chunk documents semantically similar to query.

Parameters:

  • query (String)

    natural-language query.

  • klass (Class, String) (defaults to: nil)

    a Parse::Object subclass (or its class name) declaring a :vector property. class: is accepted as an alias.

  • field (Symbol, nil) (defaults to: nil)

    the :vector property to search. Auto-resolved by find_similar when the class has exactly one.

  • text_field (Symbol, nil) (defaults to: nil)

    the text property to chunk for presentation. Defaults to the sole text source of the class's embed declaration; raises AmbiguousTextField when it can't be inferred.

  • k (Integer) (defaults to: 10)

    number of documents to retrieve. Default 10.

  • filter (Hash, nil) (defaults to: nil)

    post-$vectorSearch $match filter.

  • vector_filter (Hash, nil) (defaults to: nil)

    Atlas-native pre-search filter.

  • chunker (#chunk_with_meta, #chunk, nil) (defaults to: nil)

    chunking strategy. Defaults to Parse::Retrieval::Chunker::FixedSizeOverlap.

  • tenant_scope (Hash, nil) (defaults to: nil)

    { field:, value: } merged into vector_filter (closing the cross-tenant existence side channel) — not just a post-stage match.

  • score_quantize (Boolean) (defaults to: false)

    round scores to 1 decimal (limits membership-inference probing in non-admin contexts).

  • source_transform (#call, nil) (defaults to: nil)

    optional callable applied to each raw source record before it is stored on a Chunk. The agent tool injects tenant-scope assertion + field_allowlist projection here; a StandardError raised by the callable propagates and aborts the whole call (fail-closed). Kept as an injection point so this model-layer method stays free of any agent-layer dependency.

  • hybrid (Boolean, Hash, nil) (defaults to: nil)

    when truthy, fuse a lexical Atlas Search branch with the $vectorSearch branch via reciprocal-rank fusion (see Core::VectorSearchable#hybrid_search). true uses defaults (lexical query = query); a Hash may carry :lexical, :vector, and :fusion sub-configs.

  • rerank (#rerank, nil) (defaults to: nil)

    a Parse::Retrieval::Reranker::Base (or any object answering #rerank(query:, documents:, top_n:)). When present, retrieved documents are reordered by the cross-encoder relevance score BEFORE chunking, and the chunk score becomes the rerank relevance score.

  • rerank_top_n (Integer, nil) (defaults to: nil)

    keep only the top-N documents after reranking (defaults to all retrieved documents).

  • scope_opts (Hash)

    ACL/CLP scope kwargs forwarded verbatim to find_similar / hybrid_search: session_token: / acl_user: / acl_role: / master:.

Returns:



197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
# File 'lib/parse/retrieval/retriever.rb', line 197

def retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10,
             filter: nil, vector_filter: nil, chunker: nil,
             tenant_scope: nil, score_quantize: false,
             source_transform: nil, hybrid: nil, rerank: nil,
             rerank_top_n: nil, **scope_opts)
  if rerank && !rerank.respond_to?(:rerank)
    raise ArgumentError,
          "Parse::Retrieval.retrieve: `rerank:` must respond to #rerank " \
          "(a Parse::Retrieval::Reranker::Base); got #{rerank.class}."
  end

  # `class:` alias (reserved word — arrives via **scope_opts).
  klass ||= scope_opts.delete(:class)
  klass = resolve_class!(klass)

  unless query.is_a?(String) && !query.strip.empty?
    raise ArgumentError, "Parse::Retrieval.retrieve: `query:` must be a non-empty String."
  end

  resolved_text_field = (text_field || infer_text_field!(klass)).to_sym
  # Pointer-value translation runs BEFORE the tenant-scope fold (the
  # fold's conflict check must see final storage-form keys) and after
  # any caller-side underscore-key gate (the agent tool walks the raw
  # filter before calling retrieve).
  filter = translate_pointer_filter_values(klass, filter)
  vector_filter = translate_pointer_filter_values(klass, vector_filter)
  merged_vector_filter = fold_tenant_scope(klass, vector_filter, tenant_scope)
  chunker ||= default_chunker
  text_wire = wire_name(klass, resolved_text_field)

  raw_hits =
    if hybrid
      fetch_hybrid_hits(klass, query, k, field, filter, merged_vector_filter,
                        tenant_scope, hybrid, scope_opts)
    else
      klass.find_similar(
        text: query, k: k, field: field, filter: filter,
        vector_filter: merged_vector_filter, raw: true, **scope_opts,
      )
    end
  return [] if raw_hits.nil? || raw_hits.empty?

  raw_hits = apply_rerank(rerank, query, raw_hits, text_wire, rerank_top_n) if rerank

  raw_hits.flat_map do |doc|
    build_chunks_for(doc, klass, text_wire, score_quantize, source_transform, chunker)
  end
end

.translate_pointer_filter_values(klass, filter) ⇒ Hash?

Translate Parse pointer VALUES in a caller-supplied filter into their MongoDB storage form so they actually match raw documents. { owner: <Parse::Pointer User/abc> } becomes { "_p_owner" => "_User$abc" } — pointer columns are stored under a _p_ prefix with "<className>$<objectId>" string values, so a Parse-side pointer (a {__type: "Pointer", ...} hash on the wire, or a Parse::Pointer / Parse::Object instance from Ruby callers) in a $match / $vectorSearch.filter would otherwise never match anything.

Recognized pointer values:

  • Parse::Pointer / Parse::Object instances,
  • { "__type" => "Pointer", "className" => ..., "objectId" => ... } hashes (symbol or string keys).

Translation applies to direct values, and to pointer values inside one level of operator hashes ({ owner: { "$in" => [ptr, ptr] } }, $eq / $ne / $nin). Non-pointer values and unrecognized keys pass through untouched, so the call is idempotent.

SECURITY ORDERING: run this AFTER assert_no_underscore_keys! / the agent filter-field allowlist (callers may not name _p_* columns directly) and BEFORE the tenant-scope fold.

Parameters:

  • klass (Class)

    the Parse::Object subclass (for field_map wire-name resolution).

  • filter (Hash, nil)

    caller filter.

Returns:

  • (Hash, nil)

    translated copy (or the input when nothing needed translation / input was nil).



102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/parse/retrieval/retriever.rb', line 102

def translate_pointer_filter_values(klass, filter)
  return filter unless filter.is_a?(Hash)
  out = {}
  filter.each do |key, value|
    if (storage = pointer_storage_value(value))
      out["_p_#{wire_name(klass, key)}"] = storage
    elsif value.is_a?(Hash) && value.keys.any? { |op| op.to_s.start_with?("$") }
      translated = value.transform_values do |opval|
        if (s = pointer_storage_value(opval))
          s
        elsif opval.is_a?(Array)
          opval.map { |el| pointer_storage_value(el) || el }
        else
          opval
        end
      end
      if translated == value
        out[key] = value
      else
        out["_p_#{wire_name(klass, key)}"] = translated
      end
    else
      out[key] = value
    end
  end
  out
end