Module: Parse::Retrieval

Defined in:
lib/parse/retrieval/retriever.rb,
lib/parse/retrieval/chunk.rb,
lib/parse/retrieval/chunker.rb,
lib/parse/retrieval/agent_tool.rb

Overview

Retrieval-augmented-generation (RAG) helpers. Parse::RAG is a discoverability alias for this module.

Retrieval.retrieve is the agent-agnostic core: it embeds a natural-language query, runs Atlas $vectorSearch through the existing Class.find_similar (which enforces ACL/CLP mongo-direct), then splits each retrieved document's text field into scored Chunks for presentation.

The agent-facing semantic_search tool (see lib/parse/retrieval/agent_tool.rb) wraps Retrieval.retrieve with the agent security envelope (tenant scope, field_allowlist projection, score quantization).

== ACL model

Retrieval.retrieve does NOT implement a REST "two-stage" re-query. The vector path is mongo-direct only (Parse Server's REST /aggregate is master-key-only and bypasses ACL — see the project notes), and acl_user: / acl_role: scopes have no REST equivalent. ACL is enforced inside find_similar via a post-$vectorSearch _rperm $match. Scope kwargs (session_token: / acl_user: / acl_role: / master:) pass straight through **scope_opts.

Defined Under Namespace

Modules: AgentTool, Chunker Classes: AmbiguousTextField, Chunk, TenantScopeConflict

Class Method Summary collapse

Class Method Details

.assert_no_underscore_keys!(obj, path = []) ⇒ Object

Recursively refuse any underscore-prefixed key, at any depth, in a caller-supplied filter. This is distinct from (and stricter than) the agent layer's flat validate_keys!: a Mongo-style filter is a nested structure, and an underscore key buried inside $or / $elemMatch / a hash value could clobber tenant scope or reach a reserved column (_rperm, _p_*, _auth_data_*). The walk is unconditional — it does not special-case operators.

Parameters:

  • obj (Object)

    a filter Hash/Array (or anything; scalars pass).

  • path (Array<String>) (defaults to: [])

    internal — accumulates the key path for the error message.

Raises:

  • (ArgumentError)

    on any _-prefixed key.



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# File 'lib/parse/retrieval/retriever.rb', line 55

def assert_no_underscore_keys!(obj, path = [])
  case obj
  when Hash
    obj.each do |k, v|
      ks = k.to_s
      if ks.start_with?("_")
        raise ArgumentError,
              "filter key '#{(path + [ks]).join(".")}' is reserved (underscore-prefixed)."
      end
      assert_no_underscore_keys!(v, path + [ks])
    end
  when Array
    obj.each_with_index { |v, i| assert_no_underscore_keys!(v, path + ["[#{i}]"]) }
  end
  obj
end

.retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10, filter: nil, vector_filter: nil, chunker: nil, tenant_scope: nil, score_quantize: false, source_transform: nil, hybrid: nil, rerank: nil, **scope_opts) ⇒ Array<Parse::Retrieval::Chunk>

Retrieve and chunk documents semantically similar to query.

Parameters:

  • query (String)

    natural-language query.

  • klass (Class, String) (defaults to: nil)

    a Parse::Object subclass (or its class name) declaring a :vector property. class: is accepted as an alias.

  • field (Symbol, nil) (defaults to: nil)

    the :vector property to search. Auto-resolved by find_similar when the class has exactly one.

  • text_field (Symbol, nil) (defaults to: nil)

    the text property to chunk for presentation. Defaults to the sole text source of the class's embed declaration; raises AmbiguousTextField when it can't be inferred.

  • k (Integer) (defaults to: 10)

    number of documents to retrieve. Default 10.

  • filter (Hash, nil) (defaults to: nil)

    post-$vectorSearch $match filter.

  • vector_filter (Hash, nil) (defaults to: nil)

    Atlas-native pre-search filter.

  • chunker (#chunk_with_meta, #chunk, nil) (defaults to: nil)

    chunking strategy. Defaults to Parse::Retrieval::Chunker::FixedSizeOverlap.

  • tenant_scope (Hash, nil) (defaults to: nil)

    { field:, value: } merged into vector_filter (closing the cross-tenant existence side channel) — not just a post-stage match.

  • score_quantize (Boolean) (defaults to: false)

    round scores to 1 decimal (limits membership-inference probing in non-admin contexts).

  • source_transform (#call, nil) (defaults to: nil)

    optional callable applied to each raw source record before it is stored on a Chunk. The agent tool injects tenant-scope assertion + field_allowlist projection here; a StandardError raised by the callable propagates and aborts the whole call (fail-closed). Kept as an injection point so this model-layer method stays free of any agent-layer dependency.

  • hybrid (Object, nil) (defaults to: nil)

    reserved — raises NotImplementedError if truthy. Hybrid (vector + lexical) retrieval lands in a later release; the kwarg locks the API shape now.

  • rerank (Object, nil) (defaults to: nil)

    reserved — raises NotImplementedError if non-nil. Cross-encoder rerank lands in a later release.

  • scope_opts (Hash)

    ACL/CLP scope kwargs forwarded verbatim to find_similar: session_token: / acl_user: / acl_role: / master:.

Returns:

Raises:

  • (NotImplementedError)


111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/parse/retrieval/retriever.rb', line 111

def retrieve(query:, klass: nil, field: nil, text_field: nil, k: 10,
             filter: nil, vector_filter: nil, chunker: nil,
             tenant_scope: nil, score_quantize: false,
             source_transform: nil, hybrid: nil, rerank: nil,
             **scope_opts)
  raise NotImplementedError,
        "Parse::Retrieval.retrieve: `hybrid:` is reserved for a future release." if hybrid
  raise NotImplementedError,
        "Parse::Retrieval.retrieve: `rerank:` is reserved for a future release." if rerank

  # `class:` alias (reserved word — arrives via **scope_opts).
  klass ||= scope_opts.delete(:class)
  klass = resolve_class!(klass)

  unless query.is_a?(String) && !query.strip.empty?
    raise ArgumentError, "Parse::Retrieval.retrieve: `query:` must be a non-empty String."
  end

  resolved_text_field = (text_field || infer_text_field!(klass)).to_sym
  merged_vector_filter = fold_tenant_scope(klass, vector_filter, tenant_scope)
  chunker ||= default_chunker

  raw_hits = klass.find_similar(
    text: query,
    k: k,
    field: field,
    filter: filter,
    vector_filter: merged_vector_filter,
    raw: true,
    **scope_opts,
  )
  return [] if raw_hits.nil? || raw_hits.empty?

  text_wire = wire_name(klass, resolved_text_field)

  raw_hits.flat_map do |doc|
    build_chunks_for(doc, klass, text_wire, score_quantize, source_transform, chunker)
  end
end