Module: Parse::VectorSearch

Defined in:
lib/parse/vector_search.rb

Overview

Atlas Vector Search entry point. Routes through ‘Parse::MongoDB` rather than Parse Server’s REST aggregate (REST aggregate is master- key-only and bypasses ACL/CLP — see CLAUDE.md).

v5.0 ships the low-level surface only:

Parse::VectorSearch.search(
  "WikiArticle",
  field: :embedding,
  query_vector: vec,
  k: 10,
  index: "WikiArticle_embedding_voyage_multimodal_3_1024_idx",
  session_token: token,
)

The high-level ‘Class.find_similar(text: …)` wrapper and the `:vector` property type land later in the v5.0 cycle. This module is callable today against any collection that has a queryable `vectorSearch` index — including the `vector_prototype.Movie` fixture in `scripts/vector_prototype/`.

Stage 0 invariant

Atlas refuses any pipeline whose stage 0 is not ‘$vectorSearch`, `$search`, or `$searchMeta`. The module therefore bypasses `Parse::MongoDB.aggregate` (which prepends an ACL `$match` at stage 0) and reproduces the SDK-side enforcement chain inline —ACL `$match` is appended AFTER `$vectorSearch`, mirroring `Parse::AtlasSearch.search`.

ACL / CLP enforcement

Identity is resolved through ACLScope.resolve!, so the same kwargs accepted by mongo-direct paths are honored here: ‘session_token:`, `master: true`, `acl_user:`, `acl_role:`. The resolution drives:

  • CLP ‘find` boundary check — refuses calls the equivalent REST find would refuse.

  • Optional ‘pointerFields` post-filter — drops rows that don’t name the current user_id in the configured pointer fields.

  • Post-‘$vectorSearch` ACL `$match` injection (Parse Server’s ‘_rperm` predicate).

  • Post-fetch ‘protectedFields` redaction.

‘master: true` bypasses ACL/CLP injection (matches the standard mongo-direct semantics). The unconditional PipelineSecurity.strip_internal_fields pass runs on every result row regardless of mode, so `_hashed_password` and friends never appear in returned documents.

Defined Under Namespace

Classes: ConstraintNotSupported, InvalidQueryVector, NotAvailable

Constant Summary collapse

MAX_DIMENSIONS =

Hard cap on query-vector dimensions to bound validator work and to refuse obvious garbage (the largest production-grade model today, Voyage ‘voyage-multimodal-3`, is 1024-dim; OpenAI `text-embedding-3-large` is 3072-dim).

8192
MAX_K =

Hard cap on ‘limit` (k). Atlas itself caps `$vectorSearch.limit` at 10_000 but practical RAG workloads stay well below that; tighter cap here keeps a runaway caller from materializing a huge result set client-side.

1000
DEFAULT_NUM_CANDIDATES_MULTIPLIER =

Default ‘numCandidates` multiplier when the caller doesn’t pass one. Atlas’s guidance: numCandidates ≥ 10 × limit, ≤ 10_000.

20

Class Attribute Summary collapse

Class Method Summary collapse

Class Attribute Details

.default_indexString?

Optional fallback for search‘s `index:` keyword.

Returns:



280
281
282
# File 'lib/parse/vector_search.rb', line 280

def default_index
  @default_index
end

Class Method Details

.search(collection_name, field:, query_vector:, k: 10, num_candidates: nil, filter: nil, vector_filter: nil, index: nil, max_time_ms: nil, **scope_opts) ⇒ Array<Hash>

Low-level ‘$vectorSearch` entry point.

Parameters:

  • collection_name (String)

    Parse class name / Mongo collection name. Treated as a literal collection name; no property-type lookup happens at this layer.

  • field (String, Symbol)

    vector field path inside the document. Must match ‘path:` on the Atlas index definition.

  • query_vector (Array<Float>)

    the query embedding.

  • k (Integer) (defaults to: 10)

    number of hits to return. Capped at MAX_K.

  • num_candidates (Integer, nil) (defaults to: nil)

    Atlas’s HNSW search width. Defaults to ‘k * DEFAULT_NUM_CANDIDATES_MULTIPLIER`.

  • filter (Hash, nil) (defaults to: nil)

    additional post-‘$vectorSearch` match (validated by PipelineSecurity.validate_filter!). For pre-search filtering use `vector_filter:`.

  • vector_filter (Hash, nil) (defaults to: nil)

    Atlas-native pre-search filter, injected into ‘$vectorSearch.filter`. Atlas requires the referenced fields be declared as `type: “filter”` in the index definition. Validated by PipelineSecurity.validate_filter!.

  • index (String, nil) (defaults to: nil)

    Atlas vectorSearch index name. If nil, falls back to default_index.

  • session_token (String, nil)

    session token for ACL/CLP resolution via ACLScope.resolve!.

  • master (Boolean)

    explicit master-key opt-in; bypasses ACL/CLP enforcement.

  • acl_user (Parse::User, Parse::Pointer, nil)

    pre-resolved user pointer for ACL scoping.

  • acl_role (String, Parse::Role, nil)

    role-only scope.

  • max_time_ms (Integer, nil) (defaults to: nil)

    server-side timeout.

Returns:

  • (Array<Hash>)

    raw result documents. Each row includes ‘_vscore` (the Atlas vectorSearchScore — projected under `_vscore` rather than `_score` so hybrid pipelines with Atlas Search don’t collide on the same key).



133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
# File 'lib/parse/vector_search.rb', line 133

def search(collection_name, field:, query_vector:, k: 10,
           num_candidates: nil, filter: nil, vector_filter: nil,
           index: nil, max_time_ms: nil, **scope_opts)
  require_available!
  index_name = (index || @default_index)
  if index_name.nil? || index_name.to_s.empty?
    raise ArgumentError,
          "Parse::VectorSearch.search requires index: (or set Parse::VectorSearch.default_index)."
  end

  # `Parse::ACLScope.resolve!` mutates the options hash by deleting
  # auth kwargs. Pass a fresh hash so we don't accidentally drop
  # caller kwargs and so `resolve!` can refuse 2-of-N combinations.
  resolution = Parse::ACLScope.resolve!(scope_opts, method_name: :"VectorSearch.search")

  path = field.to_s
  if path.empty? || path.start_with?("$") || path.include?(".")
    raise ArgumentError,
          "field: must be a non-empty, non-$-prefixed, non-dotted field name."
  end
  if Parse::PipelineSecurity::INTERNAL_FIELDS_DENYLIST.include?(path) ||
     path.start_with?("_auth_data_")
    raise ArgumentError,
          "field: refuses internal/sensitive field path #{path.inspect}."
  end

  k_int = Integer(k)
  if k_int <= 0 || k_int > MAX_K
    raise ArgumentError, "k must be in 1..#{MAX_K} (got #{k_int})."
  end

  num_candidates_int = (num_candidates || (k_int * DEFAULT_NUM_CANDIDATES_MULTIPLIER)).to_i
  if num_candidates_int < k_int
    raise ArgumentError, "num_candidates (#{num_candidates_int}) must be >= k (#{k_int})."
  end
  if num_candidates_int > 10_000
    raise ArgumentError, "num_candidates capped at 10000 by Atlas (got #{num_candidates_int})."
  end

  validated_vector = validate_query_vector!(query_vector)

  Parse::PipelineSecurity.validate_filter!(filter) if filter
  Parse::PipelineSecurity.validate_filter!(vector_filter) if vector_filter

  # CLP `find` boundary + pointerFields. Mirrors
  # `Parse::AtlasSearch.search` — without this, a scoped caller
  # could issue $vectorSearch against a collection whose CLP
  # would refuse them on the equivalent REST find.
  assert_clp_find!(collection_name, resolution)
  pointer_fields = resolve_pointer_fields!(collection_name, resolution)
  protected_fields = Parse::CLPScope.protected_fields_for(
    collection_name, resolution.permission_strings,
  )

  vs_stage = {
    "index"         => index_name.to_s,
    "path"          => path,
    "queryVector"   => validated_vector,
    "numCandidates" => num_candidates_int,
    "limit"         => k_int,
  }
  vs_stage["filter"] = vector_filter if vector_filter && !vector_filter.empty?
  pipeline = [{ "$vectorSearch" => vs_stage }]

  pipeline << {
    "$addFields" => { "_vscore" => { "$meta" => "vectorSearchScore" } },
  }

  # Inject ACL $match AFTER $vectorSearch + the score projection
  # but BEFORE the caller-supplied filter, so the user-controlled
  # filter cannot exfiltrate restricted documents that passed the
  # $vectorSearch operator. NOTE: Atlas's `$vectorSearch.filter`
  # (the pre-filter) cannot enforce ACL here because `_rperm`
  # would need to be declared as `type: "filter"` in the index
  # definition — out of scope at the SDK layer. The post-stage
  # `$match` is the enforcement boundary.
  unless resolution.master?
    acl_match = Parse::ACLScope.match_stage_for(resolution)
    pipeline << acl_match if acl_match
  end

  pipeline << { "$match" => filter } if filter

  raw_results = run_pipeline!(collection_name, pipeline, max_time_ms: max_time_ms)

  # Post-fetch enforcement: walk the rows the same way
  # Parse::MongoDB.aggregate would. Master mode skips every
  # redaction layer (matches the helper's behavior).
  unless resolution.master?
    Parse::ACLScope.redact_results!(raw_results, resolution)
    Parse::CLPScope.redact_protected_fields!(raw_results, protected_fields) if protected_fields.any?
    if pointer_fields
      raw_results = Parse::CLPScope.filter_by_pointer_fields(
        raw_results, pointer_fields, resolution.user_id,
      )
    end
  end

  # Internal-fields denylist is the process-level floor: runs in
  # every mode, master included, so `_hashed_password` /
  # `_session_token` can never surface through this entry point.
  raw_results.map! { |doc| Parse::PipelineSecurity.strip_internal_fields(doc) }
  raw_results
end

.validate_query_vector!(vec, dimensions: nil) ⇒ Array<Float>

Validate a query vector. Public so callers (and tests) can invoke it independently of search.

Parameters:

  • vec (Array<Float>)

    candidate query vector.

  • dimensions (Integer, nil) (defaults to: nil)

    expected length; nil to skip the length check.

Returns:

  • (Array<Float>)

    the vector, coerced to Float and frozen.

Raises:



248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
# File 'lib/parse/vector_search.rb', line 248

def validate_query_vector!(vec, dimensions: nil)
  unless vec.is_a?(Array)
    raise InvalidQueryVector, "query_vector must be an Array (got #{vec.class})."
  end
  if vec.empty?
    raise InvalidQueryVector, "query_vector cannot be empty."
  end
  if vec.length > MAX_DIMENSIONS
    raise InvalidQueryVector,
          "query_vector length #{vec.length} exceeds MAX_DIMENSIONS=#{MAX_DIMENSIONS}."
  end
  if dimensions && vec.length != dimensions
    raise InvalidQueryVector,
          "query_vector length #{vec.length} != declared dimensions #{dimensions}."
  end
  out = Array.new(vec.length)
  vec.each_with_index do |v, i|
    unless v.is_a?(Numeric)
      raise InvalidQueryVector, "query_vector[#{i}] is not numeric (#{v.class})."
    end
    f = v.to_f
    unless f.finite?
      raise InvalidQueryVector, "query_vector[#{i}] is not finite (#{v.inspect})."
    end
    out[i] = f
  end
  out.freeze
end