Module: Parse::Core::VectorSearchable

Included in:
Object
Defined in:
lib/parse/model/core/vector_searchable.rb

Overview

Class-level find_similar wrapper around VectorSearch.search for any Parse::Object subclass that has declared at least one :vector property.

The wrapper handles three things the low-level entry point doesn't:

  1. Field resolution. Defaults to the subclass's single :vector property; raises if the class has none, requires explicit field: if it has more than one.
  2. Declared-dimension validation. Compares the query vector's length against the dimensions: declared on the property, so callers get "expected 1536, got 768" instead of an Atlas- side error after a round-trip.
  3. Index auto-discovery. Looks up the Atlas vectorSearch index covering the field via AtlasSearch::IndexManager.find_vector_index when no explicit index: kwarg is given.

ACL/CLP enforcement is inherited from VectorSearch.search (which routes through MongoDB — REST /aggregate is master-key-only and bypasses ACL/CLP, see CLAUDE.md). The full scope-kwarg surface (session_token:, master:, acl_user:, acl_role:) is forwarded as-is.

Examples:

default field, default index

WikiArticle.find_similar(vector: query_embedding, k: 5)

explicit field + post-filter, scoped to a session

Document.find_similar(
  vector: embed.call("ruby parse"),
  field: :body_embedding,
  k: 10,
  filter: { tag: "ruby" },
  session_token: user.session_token,
)

Defined Under Namespace

Classes: AmbiguousVectorField, EmbedderNotConfigured, IndexNotResolved, NoVectorProperty

Constant Summary collapse

VECTOR_VISIBILITY_MODES =

Accepted #vector_visibility modes.

%i[owner_only public].freeze

Instance Method Summary collapse

Instance Method Details

#find_similar(vector: nil, text: nil, k: 10, field: nil, filter: nil, vector_filter: nil, index: nil, num_candidates: nil, max_time_ms: nil, raw: false, **scope_opts) ⇒ Array<Parse::Object>, Array<Hash>

Note:

When text: is given, the text is sent over the wire to the embedding provider (e.g. OpenAI). Operators that enable global Faraday request logging on the embedding connection will capture the full query text in the JSON request body. Treat text: as user-visible content for log-handling purposes.

Note:

The provider is responsible for bounding its own request timeout. Embeddings::OpenAI self-bounds at 30 s read / 5 s connect with capped retries. Custom providers MUST self-bound — find_similar does not impose a wall-clock deadline on the embed step.

Find documents whose declared :vector property is closest to vector: under the Atlas vectorSearch index's similarity function.

Parameters:

  • vector (Array<Float>, Parse::Vector, nil) (defaults to: nil)

    the query embedding. Mutually exclusive with text: — exactly one of the two must be given.

  • text (String, nil) (defaults to: nil)

    natural-language query. When given, the resolved field's declared provider: is looked up via Embeddings.provider, used to embed [text] with input_type: :search_query, and the resulting vector is used in place of vector:. Requires the property to have been declared with provider: metadata.

  • k (Integer) (defaults to: 10)

    number of hits to return. Default 10.

  • field (Symbol, String, nil) (defaults to: nil)

    the :vector property to search. Auto-resolves when the class has exactly one :vector property.

  • filter (Hash, nil) (defaults to: nil)

    post-$vectorSearch $match filter.

  • vector_filter (Hash, nil) (defaults to: nil)

    Atlas-native pre-search filter (fields must be declared type: "filter" in the index).

  • index (String, nil) (defaults to: nil)

    explicit vectorSearch index name. Skips auto-discovery when given.

  • num_candidates (Integer, nil) (defaults to: nil)

    HNSW search width.

  • max_time_ms (Integer, nil) (defaults to: nil)

    server-side timeout.

  • raw (Boolean) (defaults to: false)

    when true return the raw Mongo documents (each enriched with _vscore); when false (default) build instances of the calling class and attach vector_score.

  • scope_opts (Hash)

    ACL/CLP scope kwargs forwarded to VectorSearch.search: session_token:, master:, acl_user:, acl_role:.

Returns:

  • (Array<Parse::Object>, Array<Hash>)

    hits in descending-similarity order. Each instance responds to vector_score (the Atlas vectorSearchScore).

Raises:



177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
# File 'lib/parse/model/core/vector_searchable.rb', line 177

def find_similar(vector: nil, text: nil, k: 10, field: nil, filter: nil,
                 vector_filter: nil, index: nil,
                 num_candidates: nil, max_time_ms: nil, raw: false,
                 **scope_opts)
  if vector.nil? && text.nil?
    raise ArgumentError,
          "#{self}.find_similar: must pass either `vector:` or `text:`."
  end
  if !vector.nil? && !text.nil?
    raise ArgumentError,
          "#{self}.find_similar: pass either `vector:` or `text:`, not both."
  end

  resolved_field = resolve_vector_field!(field)
  declared_dims = vector_properties.dig(resolved_field, :dimensions)

  query_vector =
    if text.nil?
      coerce_query_vector(vector)
    else
      embed_query_text!(text, resolved_field)
    end
  Parse::VectorSearch.validate_query_vector!(query_vector, dimensions: declared_dims)

  index_name = resolve_vector_index!(resolved_field, index)

  raw_hits = Parse::VectorSearch.search(
    parse_class,
    field: resolved_field,
    query_vector: query_vector,
    k: k,
    num_candidates: num_candidates,
    filter: filter,
    vector_filter: vector_filter,
    index: index_name,
    max_time_ms: max_time_ms,
    **scope_opts,
  )

  return raw_hits if raw
  build_vector_hits(raw_hits)
end

#hybrid_search(text: nil, query_vector: nil, lexical: {}, vector: {}, k: 20, fusion: nil, raw: false, **scope_opts) ⇒ Array<Parse::Object>

Hybrid (lexical + vector) search with reciprocal-rank fusion.

Runs a lexical Atlas Search branch and a $vectorSearch branch independently, then fuses their ranked results client-side via RRF (or, on Atlas 8.0+, server-side via native $rankFusion when detected). Both branches enforce ACL/CLP/protectedFields before fusion — see VectorSearch::Hybrid.

Examples:

Song.hybrid_search(
  text: "love songs about rain",
  lexical: { index: "song_search", query: "rain love" },
  vector:  { num_candidates: 200 },
  k: 20,
  fusion: { k_constant: 60, weights: { lexical: 0.4, vector: 0.6 } },
)

Parameters:

  • text (String, nil) (defaults to: nil)

    natural-language query. Embedded (via the resolved :vector property's provider:) for the vector branch, and used as the lexical query unless lexical[:query] overrides it.

  • query_vector (Array<Float>, Parse::Vector, nil) (defaults to: nil)

    pre-computed query embedding (alternative to text: for the vector branch).

  • lexical (Hash) (defaults to: {})

    lexical branch config (:query, :index, :fields, :filter, :fuzzy). :query defaults to text:.

  • vector (Hash) (defaults to: {})

    vector branch config (:field, :index, :num_candidates, :filter, :vector_filter). :field defaults to the class's sole :vector property; :index is auto-discovered when omitted.

  • k (Integer) (defaults to: 20)

    number of fused hits to return.

  • fusion (Hash, nil) (defaults to: nil)

    :method (:rrf / :rrf_client), :k_constant, :weights ({ lexical:, vector: }).

  • raw (Boolean) (defaults to: false)

    return fused raw rows instead of built Parse::Object instances.

  • scope_opts (Hash)

    ACL/CLP scope kwargs forwarded to both branches (session_token: / master: / acl_user: / acl_role:).

Returns:

  • (Array<Parse::Object>)

    fused, RRF-ordered; each carries #hybrid_score and #hybrid_ranks (and #vector_score / #search_score when the branch contributed). raw: true returns the fused Hashes.



261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
# File 'lib/parse/model/core/vector_searchable.rb', line 261

def hybrid_search(text: nil, query_vector: nil, lexical: {}, vector: {},
                  k: 20, fusion: nil, raw: false, **scope_opts)
  require_relative "../../vector_search/hybrid"
  lex = (lexical || {}).transform_keys(&:to_sym)
  vec = (vector || {}).transform_keys(&:to_sym)

  field_sym = resolve_vector_field!(vec[:field])
  declared_dims = vector_properties.dig(field_sym, :dimensions)

  qv = query_vector || vec[:query_vector]
  qv =
    if qv.nil?
      unless text.is_a?(String) && !text.strip.empty?
        raise ArgumentError,
              "#{self}.hybrid_search: pass `text:` (to embed) or a `query_vector:`."
      end
      embed_query_text!(text, field_sym)
    else
      coerce_query_vector(qv)
    end
  Parse::VectorSearch.validate_query_vector!(qv, dimensions: declared_dims)

  lexical_query = lex[:query] || text
  unless lexical_query.is_a?(String) && !lexical_query.strip.empty?
    raise ArgumentError,
          "#{self}.hybrid_search: needs a lexical query — pass `text:` or `lexical: { query: }`."
  end

  vector_index = vec[:index] || resolve_vector_index!(field_sym, nil)

  fused = Parse::VectorSearch::Hybrid.search(
    parse_class,
    lexical: {
      query: lexical_query, index: lex[:index], fields: lex[:fields],
      filter: lex[:filter], fuzzy: lex[:fuzzy],
    },
    vector: {
      query_vector: qv, field: field_sym, index: vector_index,
      num_candidates: vec[:num_candidates], filter: vec[:filter],
      vector_filter: vec[:vector_filter],
    },
    k: k,
    fusion: fusion,
    **scope_opts,
  )

  return fused if raw
  build_hybrid_hits(fused)
end

#vector_visibility(mode = nil) ⇒ Symbol

Class-level default for whether this class's :vector properties are included in as_json serialization.

  • :owner_only (default) — vectors are OMITTED from as_json unless the caller passes include_vectors: true. Embeddings are large and leak ML signal; the safe default keeps them off the wire and out of API responses. Row-level read access is still governed by ACL as usual — this controls serialization exposure, not row authorization.
  • :public — vectors are INCLUDED in as_json by default (a caller can still suppress per-call with include_vectors: false).

class Article < Parse::Object vector_visibility :public # expose embeddings in as_json property :embedding, :vector, dimensions: 1536, provider: :openai end

Read the effective mode by calling with no argument; it inherits from the superclass when unset on the subclass.

Parameters:

Returns:

  • (Symbol)

    the effective mode (when reading) or the mode set.

Raises:

  • (ArgumentError)

    on an unknown mode.



93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/parse/model/core/vector_searchable.rb', line 93

def vector_visibility(mode = nil)
  if mode.nil?
    return @vector_visibility if defined?(@vector_visibility) && @vector_visibility
    return superclass.vector_visibility if superclass.respond_to?(:vector_visibility)
    return :owner_only
  end
  m = mode.to_sym
  unless VECTOR_VISIBILITY_MODES.include?(m)
    raise ArgumentError,
          "#{self}.vector_visibility: mode must be one of " \
          "#{VECTOR_VISIBILITY_MODES.inspect} (got #{mode.inspect})."
  end
  @vector_visibility = m
end

#vectors_public_by_default?Boolean

Returns whether :vector fields are serialized into as_json by default for this class (true only for :public).

Returns:

  • (Boolean)

    whether :vector fields are serialized into as_json by default for this class (true only for :public).



110
111
112
# File 'lib/parse/model/core/vector_searchable.rb', line 110

def vectors_public_by_default?
  vector_visibility == :public
end