Module: Parse::VectorSearch
- Defined in:
- lib/parse/vector_search.rb
Overview
Atlas Vector Search entry point. Routes through ‘Parse::MongoDB` rather than Parse Server’s REST aggregate (REST aggregate is master- key-only and bypasses ACL/CLP — see CLAUDE.md).
v5.0 ships the low-level surface only:
Parse::VectorSearch.search(
"WikiArticle",
field: :embedding,
query_vector: vec,
k: 10,
index: "WikiArticle_embedding_voyage_multimodal_3_1024_idx",
session_token: token,
)
The high-level ‘Class.find_similar(text: …)` wrapper and the `:vector` property type land later in the v5.0 cycle. This module is callable today against any collection that has a queryable `vectorSearch` index — including the `vector_prototype.Movie` fixture in `scripts/vector_prototype/`.
Stage 0 invariant
Atlas refuses any pipeline whose stage 0 is not ‘$vectorSearch`, `$search`, or `$searchMeta`. The module therefore bypasses `Parse::MongoDB.aggregate` (which prepends an ACL `$match` at stage 0) and reproduces the SDK-side enforcement chain inline —ACL `$match` is appended AFTER `$vectorSearch`, mirroring `Parse::AtlasSearch.search`.
ACL / CLP enforcement
Identity is resolved through ACLScope.resolve!, so the same kwargs accepted by mongo-direct paths are honored here: ‘session_token:`, `master: true`, `acl_user:`, `acl_role:`. The resolution drives:
-
CLP ‘find` boundary check — refuses calls the equivalent REST find would refuse.
-
Optional ‘pointerFields` post-filter — drops rows that don’t name the current user_id in the configured pointer fields.
-
Post-‘$vectorSearch` ACL `$match` injection (Parse Server’s ‘_rperm` predicate).
-
Post-fetch ‘protectedFields` redaction.
‘master: true` bypasses ACL/CLP injection (matches the standard mongo-direct semantics). The unconditional PipelineSecurity.strip_internal_fields pass runs on every result row regardless of mode, so `_hashed_password` and friends never appear in returned documents.
Defined Under Namespace
Classes: ConstraintNotSupported, InvalidQueryVector, NotAvailable
Constant Summary collapse
- MAX_DIMENSIONS =
Hard cap on query-vector dimensions to bound validator work and to refuse obvious garbage (the largest production-grade model today, Voyage ‘voyage-multimodal-3`, is 1024-dim; OpenAI `text-embedding-3-large` is 3072-dim).
8192- MAX_K =
Hard cap on ‘limit` (k). Atlas itself caps `$vectorSearch.limit` at 10_000 but practical RAG workloads stay well below that; tighter cap here keeps a runaway caller from materializing a huge result set client-side.
1000- DEFAULT_NUM_CANDIDATES_MULTIPLIER =
Default ‘numCandidates` multiplier when the caller doesn’t pass one. Atlas’s guidance: numCandidates ≥ 10 × limit, ≤ 10_000.
20
Class Attribute Summary collapse
-
.default_index ⇒ String?
Optional fallback for VectorSearch.search‘s `index:` keyword.
Class Method Summary collapse
-
.search(collection_name, field:, query_vector:, k: 10, num_candidates: nil, filter: nil, vector_filter: nil, index: nil, max_time_ms: nil, **scope_opts) ⇒ Array<Hash>
Low-level ‘$vectorSearch` entry point.
-
.validate_query_vector!(vec, dimensions: nil) ⇒ Array<Float>
Validate a query vector.
Class Attribute Details
Class Method Details
.search(collection_name, field:, query_vector:, k: 10, num_candidates: nil, filter: nil, vector_filter: nil, index: nil, max_time_ms: nil, **scope_opts) ⇒ Array<Hash>
Low-level ‘$vectorSearch` entry point.
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
# File 'lib/parse/vector_search.rb', line 133 def search(collection_name, field:, query_vector:, k: 10, num_candidates: nil, filter: nil, vector_filter: nil, index: nil, max_time_ms: nil, **scope_opts) require_available! index_name = (index || @default_index) if index_name.nil? || index_name.to_s.empty? raise ArgumentError, "Parse::VectorSearch.search requires index: (or set Parse::VectorSearch.default_index)." end # `Parse::ACLScope.resolve!` mutates the options hash by deleting # auth kwargs. Pass a fresh hash so we don't accidentally drop # caller kwargs and so `resolve!` can refuse 2-of-N combinations. resolution = Parse::ACLScope.resolve!(scope_opts, method_name: :"VectorSearch.search") path = field.to_s if path.empty? || path.start_with?("$") || path.include?(".") raise ArgumentError, "field: must be a non-empty, non-$-prefixed, non-dotted field name." end if Parse::PipelineSecurity::INTERNAL_FIELDS_DENYLIST.include?(path) || path.start_with?("_auth_data_") raise ArgumentError, "field: refuses internal/sensitive field path #{path.inspect}." end k_int = Integer(k) if k_int <= 0 || k_int > MAX_K raise ArgumentError, "k must be in 1..#{MAX_K} (got #{k_int})." end num_candidates_int = (num_candidates || (k_int * DEFAULT_NUM_CANDIDATES_MULTIPLIER)).to_i if num_candidates_int < k_int raise ArgumentError, "num_candidates (#{num_candidates_int}) must be >= k (#{k_int})." end if num_candidates_int > 10_000 raise ArgumentError, "num_candidates capped at 10000 by Atlas (got #{num_candidates_int})." end validated_vector = validate_query_vector!(query_vector) Parse::PipelineSecurity.validate_filter!(filter) if filter Parse::PipelineSecurity.validate_filter!(vector_filter) if vector_filter # CLP `find` boundary + pointerFields. Mirrors # `Parse::AtlasSearch.search` — without this, a scoped caller # could issue $vectorSearch against a collection whose CLP # would refuse them on the equivalent REST find. assert_clp_find!(collection_name, resolution) pointer_fields = resolve_pointer_fields!(collection_name, resolution) protected_fields = Parse::CLPScope.protected_fields_for( collection_name, resolution., ) vs_stage = { "index" => index_name.to_s, "path" => path, "queryVector" => validated_vector, "numCandidates" => num_candidates_int, "limit" => k_int, } vs_stage["filter"] = vector_filter if vector_filter && !vector_filter.empty? pipeline = [{ "$vectorSearch" => vs_stage }] pipeline << { "$addFields" => { "_vscore" => { "$meta" => "vectorSearchScore" } }, } # Inject ACL $match AFTER $vectorSearch + the score projection # but BEFORE the caller-supplied filter, so the user-controlled # filter cannot exfiltrate restricted documents that passed the # $vectorSearch operator. NOTE: Atlas's `$vectorSearch.filter` # (the pre-filter) cannot enforce ACL here because `_rperm` # would need to be declared as `type: "filter"` in the index # definition — out of scope at the SDK layer. The post-stage # `$match` is the enforcement boundary. unless resolution.master? acl_match = Parse::ACLScope.match_stage_for(resolution) pipeline << acl_match if acl_match end pipeline << { "$match" => filter } if filter raw_results = run_pipeline!(collection_name, pipeline, max_time_ms: max_time_ms) # Post-fetch enforcement: walk the rows the same way # Parse::MongoDB.aggregate would. Master mode skips every # redaction layer (matches the helper's behavior). unless resolution.master? Parse::ACLScope.redact_results!(raw_results, resolution) Parse::CLPScope.redact_protected_fields!(raw_results, protected_fields) if protected_fields.any? if pointer_fields raw_results = Parse::CLPScope.filter_by_pointer_fields( raw_results, pointer_fields, resolution.user_id, ) end end # Internal-fields denylist is the process-level floor: runs in # every mode, master included, so `_hashed_password` / # `_session_token` can never surface through this entry point. raw_results.map! { |doc| Parse::PipelineSecurity.strip_internal_fields(doc) } raw_results end |
.validate_query_vector!(vec, dimensions: nil) ⇒ Array<Float>
Validate a query vector. Public so callers (and tests) can invoke it independently of search.
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
# File 'lib/parse/vector_search.rb', line 248 def validate_query_vector!(vec, dimensions: nil) unless vec.is_a?(Array) raise InvalidQueryVector, "query_vector must be an Array (got #{vec.class})." end if vec.empty? raise InvalidQueryVector, "query_vector cannot be empty." end if vec.length > MAX_DIMENSIONS raise InvalidQueryVector, "query_vector length #{vec.length} exceeds MAX_DIMENSIONS=#{MAX_DIMENSIONS}." end if dimensions && vec.length != dimensions raise InvalidQueryVector, "query_vector length #{vec.length} != declared dimensions #{dimensions}." end out = Array.new(vec.length) vec.each_with_index do |v, i| unless v.is_a?(Numeric) raise InvalidQueryVector, "query_vector[#{i}] is not numeric (#{v.class})." end f = v.to_f unless f.finite? raise InvalidQueryVector, "query_vector[#{i}] is not finite (#{v.inspect})." end out[i] = f end out.freeze end |