Module: Parse::VectorSearch

Defined in:
lib/parse/vector_search.rb,
lib/parse/vector_search/hybrid.rb

Overview

Atlas Vector Search entry point. Routes through Parse::MongoDB rather than Parse Server's REST aggregate (REST aggregate is master- key-only and bypasses ACL/CLP — see CLAUDE.md).

v5.0 ships the low-level surface only:

Parse::VectorSearch.search( "WikiArticle", field: :embedding, query_vector: vec, k: 10, index: "WikiArticle_embedding_voyage_multimodal_3_1024_idx", session_token: token, )

The high-level Class.find_similar(text: …) wrapper and the :vector property type land later in the v5.0 cycle. This module is callable today against any collection that has a queryable vectorSearch index — including the vector_prototype.Movie fixture in scripts/vector_prototype/.

== Stage 0 invariant

Atlas refuses any pipeline whose stage 0 is not $vectorSearch, $search, or $searchMeta. The module therefore bypasses Parse::MongoDB.aggregate (which prepends an ACL $match at stage 0) and reproduces the SDK-side enforcement chain inline — ACL $match is appended AFTER $vectorSearch, mirroring Parse::AtlasSearch.search.

== ACL / CLP enforcement

Identity is resolved through ACLScope.resolve!, so the same kwargs accepted by mongo-direct paths are honored here: session_token:, master: true, acl_user:, acl_role:. The resolution drives:

  • CLP find boundary check — refuses calls the equivalent REST find would refuse.
  • Optional pointerFields post-filter — drops rows that don't name the current user_id in the configured pointer fields.
  • Post-$vectorSearch ACL $match injection (Parse Server's _rperm predicate).
  • Post-fetch protectedFields redaction.

master: true bypasses ACL/CLP injection (matches the standard mongo-direct semantics). The unconditional PipelineSecurity.strip_internal_fields pass runs on every result row regardless of mode, so _hashed_password and friends never appear in returned documents.

Defined Under Namespace

Modules: Hybrid Classes: ConstraintNotSupported, InvalidQueryVector, NotAvailable

Constant Summary collapse

MAX_DIMENSIONS =

Hard cap on query-vector dimensions to bound validator work and to refuse obvious garbage (the largest production-grade model today, Voyage voyage-multimodal-3, is 1024-dim; OpenAI text-embedding-3-large is 3072-dim).

8192
MAX_K =

Hard cap on limit (k). Atlas itself caps $vectorSearch.limit at 10_000 but practical RAG workloads stay well below that; tighter cap here keeps a runaway caller from materializing a huge result set client-side.

1000
DEFAULT_NUM_CANDIDATES_MULTIPLIER =

Default numCandidates multiplier when the caller doesn't pass one. Atlas's guidance: numCandidates ≥ 10 × limit, ≤ 10_000.

20
INDEX_DRIFT_POLICIES =

Accepted index_drift_policy values.

%i[warn raise ignore].freeze

Class Attribute Summary collapse

Class Method Summary collapse

Class Attribute Details

.default_indexString?

Optional fallback for search's index: keyword.

Returns:



314
315
316
# File 'lib/parse/vector_search.rb', line 314

def default_index
  @default_index
end

Class Method Details

.index_drift_policySymbol

Returns current drift policy (default :warn).

Returns:

  • (Symbol)

    current drift policy (default :warn).



130
131
132
# File 'lib/parse/vector_search.rb', line 130

def index_drift_policy
  @index_drift_policy ||= :warn
end

.index_drift_policy=(value) ⇒ Symbol

Policy applied when first-query index verification (see Core::VectorSearchable) finds the deployed Atlas vectorSearch index disagreeing with the model declaration — wrong numDimensions, wrong similarity, or a tenant-scope field missing from the index's filter paths.

  • :warn (default) — emit a [Parse::VectorSearch:DRIFT] warning once per (class, field, index) and continue. Drift usually means the index predates a model change; queries still run but return degraded or wrongly-scoped results.
  • :raise — fail the query with Core::VectorSearchable::IndexDriftError. Strict mode for deployments that treat drift as a release blocker.
  • :ignore — skip verification entirely.

Parameters:

Returns:



119
120
121
122
123
124
125
126
127
# File 'lib/parse/vector_search.rb', line 119

def index_drift_policy=(value)
  v = value.respond_to?(:to_sym) ? value.to_sym : nil
  unless v && INDEX_DRIFT_POLICIES.include?(v)
    raise ArgumentError,
          "Parse::VectorSearch.index_drift_policy must be one of " \
          "#{INDEX_DRIFT_POLICIES.inspect} (got #{value.inspect})."
  end
  @index_drift_policy = v
end

.search(collection_name, field:, query_vector:, k: 10, num_candidates: nil, filter: nil, vector_filter: nil, index: nil, max_time_ms: nil, **scope_opts) ⇒ Array<Hash>

Low-level $vectorSearch entry point.

Parameters:

  • collection_name (String)

    Parse class name / Mongo collection name. Treated as a literal collection name; no property-type lookup happens at this layer.

  • field (String, Symbol)

    vector field path inside the document. Must match path: on the Atlas index definition.

  • query_vector (Array<Float>)

    the query embedding.

  • k (Integer) (defaults to: 10)

    number of hits to return. Capped at MAX_K.

  • num_candidates (Integer, nil) (defaults to: nil)

    Atlas's HNSW search width. Defaults to k * DEFAULT_NUM_CANDIDATES_MULTIPLIER.

  • filter (Hash, nil) (defaults to: nil)

    additional post-$vectorSearch match (validated by PipelineSecurity.validate_filter!). For pre-search filtering use vector_filter:.

  • vector_filter (Hash, nil) (defaults to: nil)

    Atlas-native pre-search filter, injected into $vectorSearch.filter. Atlas requires the referenced fields be declared as type: "filter" in the index definition. Validated by PipelineSecurity.validate_filter!.

  • index (String, nil) (defaults to: nil)

    Atlas vectorSearch index name. If nil, falls back to default_index.

  • session_token (String, nil)

    session token for ACL/CLP resolution via ACLScope.resolve!.

  • master (Boolean)

    explicit master-key opt-in; bypasses ACL/CLP enforcement.

  • acl_user (Parse::User, Parse::Pointer, nil)

    pre-resolved user pointer for ACL scoping.

  • acl_role (String, Parse::Role, nil)

    role-only scope.

  • max_time_ms (Integer, nil) (defaults to: nil)

    server-side timeout.

Returns:

  • (Array<Hash>)

    raw result documents. Each row includes _vscore (the Atlas vectorSearchScore — projected under _vscore rather than _score so hybrid pipelines with Atlas Search don't collide on the same key).



167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
# File 'lib/parse/vector_search.rb', line 167

def search(collection_name, field:, query_vector:, k: 10,
           num_candidates: nil, filter: nil, vector_filter: nil,
           index: nil, max_time_ms: nil, **scope_opts)
  require_available!
  index_name = (index || @default_index)
  if index_name.nil? || index_name.to_s.empty?
    raise ArgumentError,
          "Parse::VectorSearch.search requires index: (or set Parse::VectorSearch.default_index)."
  end

  # `Parse::ACLScope.resolve!` mutates the options hash by deleting
  # auth kwargs. Pass a fresh hash so we don't accidentally drop
  # caller kwargs and so `resolve!` can refuse 2-of-N combinations.
  resolution = Parse::ACLScope.resolve!(scope_opts, method_name: :"VectorSearch.search")

  path = field.to_s
  if path.empty? || path.start_with?("$") || path.include?(".")
    raise ArgumentError,
          "field: must be a non-empty, non-$-prefixed, non-dotted field name."
  end
  if Parse::PipelineSecurity::INTERNAL_FIELDS_DENYLIST.include?(path) ||
     path.start_with?("_auth_data_")
    raise ArgumentError,
          "field: refuses internal/sensitive field path #{path.inspect}."
  end

  k_int = Integer(k)
  if k_int <= 0 || k_int > MAX_K
    raise ArgumentError, "k must be in 1..#{MAX_K} (got #{k_int})."
  end

  num_candidates_int = (num_candidates || (k_int * DEFAULT_NUM_CANDIDATES_MULTIPLIER)).to_i
  if num_candidates_int < k_int
    raise ArgumentError, "num_candidates (#{num_candidates_int}) must be >= k (#{k_int})."
  end
  if num_candidates_int > 10_000
    raise ArgumentError, "num_candidates capped at 10000 by Atlas (got #{num_candidates_int})."
  end

  validated_vector = validate_query_vector!(query_vector)

  Parse::PipelineSecurity.validate_filter!(filter) if filter
  Parse::PipelineSecurity.validate_filter!(vector_filter) if vector_filter

  # CLP `find` boundary + pointerFields. Mirrors
  # `Parse::AtlasSearch.search` — without this, a scoped caller
  # could issue $vectorSearch against a collection whose CLP
  # would refuse them on the equivalent REST find.
  assert_clp_find!(collection_name, resolution)
  pointer_fields = resolve_pointer_fields!(collection_name, resolution)
  protected_fields = Parse::CLPScope.protected_fields_for(
    collection_name, resolution.permission_strings,
  )

  vs_stage = {
    "index"         => index_name.to_s,
    "path"          => path,
    "queryVector"   => validated_vector,
    "numCandidates" => num_candidates_int,
    "limit"         => k_int,
  }
  vs_stage["filter"] = vector_filter if vector_filter && !vector_filter.empty?
  pipeline = [{ "$vectorSearch" => vs_stage }]

  pipeline << {
    "$addFields" => { "_vscore" => { "$meta" => "vectorSearchScore" } },
  }

  # Inject ACL $match AFTER $vectorSearch + the score projection
  # but BEFORE the caller-supplied filter, so the user-controlled
  # filter cannot exfiltrate restricted documents that passed the
  # $vectorSearch operator. NOTE: Atlas's `$vectorSearch.filter`
  # (the pre-filter) cannot enforce ACL here because `_rperm`
  # would need to be declared as `type: "filter"` in the index
  # definition — out of scope at the SDK layer. The post-stage
  # `$match` is the enforcement boundary.
  unless resolution.master?
    acl_match = Parse::ACLScope.match_stage_for(resolution)
    pipeline << acl_match if acl_match
  end

  pipeline << { "$match" => filter } if filter

  raw_results = run_pipeline!(collection_name, pipeline, max_time_ms: max_time_ms)

  # Post-fetch enforcement: walk the rows the same way
  # Parse::MongoDB.aggregate would. Master mode skips every
  # redaction layer (matches the helper's behavior).
  unless resolution.master?
    Parse::ACLScope.redact_results!(raw_results, resolution)
    Parse::CLPScope.redact_protected_fields!(raw_results, protected_fields) if protected_fields.any?
    if pointer_fields
      raw_results = Parse::CLPScope.filter_by_pointer_fields(
        raw_results, pointer_fields, resolution.user_id,
      )
    end
  end

  # Internal-fields denylist is the process-level floor: runs in
  # every mode, master included, so `_hashed_password` /
  # `_session_token` can never surface through this entry point.
  raw_results.map! { |doc| Parse::PipelineSecurity.strip_internal_fields(doc) }
  raw_results
end

.validate_query_vector!(vec, dimensions: nil) ⇒ Array<Float>

Validate a query vector. Public so callers (and tests) can invoke it independently of search.

Parameters:

  • vec (Array<Float>)

    candidate query vector.

  • dimensions (Integer, nil) (defaults to: nil)

    expected length; nil to skip the length check.

Returns:

  • (Array<Float>)

    the vector, coerced to Float and frozen.

Raises:



282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
# File 'lib/parse/vector_search.rb', line 282

def validate_query_vector!(vec, dimensions: nil)
  unless vec.is_a?(Array)
    raise InvalidQueryVector, "query_vector must be an Array (got #{vec.class})."
  end
  if vec.empty?
    raise InvalidQueryVector, "query_vector cannot be empty."
  end
  if vec.length > MAX_DIMENSIONS
    raise InvalidQueryVector,
          "query_vector length #{vec.length} exceeds MAX_DIMENSIONS=#{MAX_DIMENSIONS}."
  end
  if dimensions && vec.length != dimensions
    raise InvalidQueryVector,
          "query_vector length #{vec.length} != declared dimensions #{dimensions}."
  end
  out = Array.new(vec.length)
  vec.each_with_index do |v, i|
    unless v.is_a?(Numeric)
      raise InvalidQueryVector, "query_vector[#{i}] is not numeric (#{v.class})."
    end
    f = v.to_f
    unless f.finite?
      raise InvalidQueryVector, "query_vector[#{i}] is not finite (#{v.inspect})."
    end
    out[i] = f
  end
  out.freeze
end