Atlas Vector Search Guide

Parse Stack ships first-class support for MongoDB Atlas $vectorSearch against Parse classes. This guide covers the full surface: declaring :vector properties, registering embedding providers, running find_similar queries, the embed and embed_image write-side macros, Atlas index management, AS::N telemetry, and the constraint and logging behavior callers need to know about.

v5.0 introduced the text-embedding path; v5.1 adds image embedding via the new embed_image macro, Voyage#embed_image (voyage-multimodal-3, 1024-dim), and Cohere#embed_image (embed-v4.0, 1536-dim). Image inputs are URL-only in v5.1 (the SDK forwards the file URL to the provider; the SDK does not fetch image bytes) and are gated behind an explicit operator opt-in plus a CDN allowlist — see §Image embedding below.

For the underlying mongo-direct enforcement model that vector search inherits, see mongodb_direct_guide.md.

When to use vector search

Use Atlas vector search when:

You need semantic similarity ("articles about X" where "about X" is a meaning, not a substring) rather than substring / token matches.
Your records have a natural text or image embedding source (title + body, transcript, caption, etc.).
You are already running on MongoDB Atlas, or on a self-managed cluster with the search/vectorSearch extension available. Atlas Local works for development and integration tests.

Do NOT use vector search for:

Exact / substring matching — use Parse's normal query operators or Atlas $search text indexes (see Atlas Search docs).
Tiny corpora (< a few hundred docs) where a brute-force cosine in application code would be cheaper than maintaining an index.

Declaring a `:vector` property

:vector is a first-class Parse property type. The declaration captures the vector's width, the provider that produces it, the model name, and the similarity function the Atlas index will use.

class Document < Parse::Object
  property :title, :string
  property :body,  :string

  property :body_embedding, :vector,
           dimensions: 1536,
           provider:   :openai,
           model:      "text-embedding-3-small",
           similarity: :cosine
end

dimensions: (required) — fixed output width. Must match what the registered provider returns and what the Atlas vectorSearch index declares. Mismatches raise Parse::Embeddings::InvalidResponseError on write or Parse::VectorSearch::InvalidQueryVector on read.
provider: — name registered via Parse::Embeddings.register or Parse::Embeddings.configure. Required for the embed macro and for the find_similar(text:) overload; optional if you only ever pass pre-computed vector: Arrays.
model: — stable identifier, persisted to embedding_meta and used in cache keys. Changing this on an existing field is a migration — see the §Re-embedding section below.
similarity: — one of :cosine, :dotProduct, :euclidean. Determines how the Atlas index ranks. Pick :cosine for normalized text embeddings; :dotProduct for raw OpenAI/Cohere output if you want to skip the unit-normalize step.

Storage shape

:vector properties serialize as plain BSON arrays of floats. There is no Parse-side wrapper class on the wire. In memory they are Parse::Vector instances which respond to to_a, dimensions, and arithmetic helpers.

Constraint refusal

Vector fields are NOT general-purpose query targets. The query builder refuses every operator on :vector columns except :exists and :null. Attempting where(body_embedding: <Array>) or where(:body_embedding.gt => 0.5) raises at query build time — semantic similarity must go through find_similar, not the normal where DSL.

Body builder compaction

When Parse::Object#inspect or the request logger has to print a record carrying a vector, the formatter replaces the array with a compact <vector dims=N> placeholder once the length is ≥ 32. This keeps multi-thousand-dim arrays out of error trackers and stack traces. The wire payload itself is unchanged.

Registering an embedding provider

Parse::Embeddings is a pluggable registry. v5.1 ships seven built-in providers:

Parse::Embeddings::OpenAI — text-only. text-embedding-3-small (1536-dim, default), text-embedding-3-large (3072-dim, Matryoshka via dimensions:), legacy text-embedding-ada-002. Forwards OpenAI-Organization / OpenAI-Project headers when supplied.
Parse::Embeddings::Cohere — v3 family (embed-english-v3.0, embed-multilingual-v3.0, and -light-v3.0 siblings; 1024 / 384 dim) plus embed-v4.0 (1536 native, 128k token context, Matryoshka- truncatable to 512, 1024, 1536 via dimensions:). embed-v4.0 is Cohere's text+image multimodal endpoint; the text path routes through /v1/embed and the v5.1 image path routes through /v2/embed with OpenAI-style nested { type: "image_url", image_url: { url: ... } } content rows.
Parse::Embeddings::Voyage — voyage-4 family (voyage-4-large 2048, Matryoshka; voyage-4 1024; voyage-4-lite 512; voyage-4-nano 256), voyage-3 family, domain models (voyage-code-3, voyage-finance-2, voyage-law-2), and voyage-multimodal-3 (1024-dim, 32k token context, routes to /v1/multimodalembeddings with the wrapped {inputs: [{content: [{type: "text", text: ...}]}]} envelope for text and {type: "image_url", image_url: <url>} content rows for the v5.1 embed_image path).
Parse::Embeddings::Jina — jina-embeddings-v3 (1024, Matryoshka 32–1024), jina-embeddings-v4 (2048, Matryoshka), v5 family (jina-embeddings-v5-text-{small,nano}, jina-embeddings-v5-omni-{small,nano} — omni accepts plain-text here), and jina-code-embeddings-{0.5b,1.5b}. Distinguishes input_type: via Jina's task field (retrieval.query / retrieval.passage / classification / separation). Rerankers and image-only models are out of scope.
Parse::Embeddings::Qwen — qwen3-embedding-0.6b (1024), qwen3-embedding-4b (2560), qwen3-embedding-8b (4096), all Matryoshka. Targets Alibaba Cloud DashScope's OpenAI-compatible endpoint; operators in mainland China override base_url: to https://dashscope.aliyuncs.com/compatible-mode/v1. Same checkpoints are open-weight on Hugging Face (Apache 2.0) — self-host with LocalHTTP.
Parse::Embeddings::LocalHTTP — generic OpenAI-compatible client for self-hosted gateways (Ollama, LM Studio, vLLM, Text Embeddings Inference, llama.cpp). Configure-time SSRF gate refuses loopback / RFC1918 / link-local / cloud-metadata bases unless opted in with allow_private_endpoint: true (emits a Kernel#warn audit line).
Parse::Embeddings::Fixture — deterministic, zero-network. Used by the test suite. Auto-registered under :fixture, no setup required.

Production: OpenAI

Parse::Embeddings.configure do |c|
  c.providers[:openai] = Parse::Embeddings::OpenAI.new(
    api_key: ENV.fetch("OPENAI_API_KEY"),
    model:   "text-embedding-3-small",
  )
end

The OpenAI provider self-bounds at 30 s read / 5 s connect with capped exponential retry on 429 and 5xx. There is no implicit wall-clock deadline imposed by find_similar or by the embed macro — the provider is responsible for bounding its own request time. Custom providers MUST follow the same convention.

Tests: Fixture

provider = Parse::Embeddings.provider(:fixture)  # zero-config
vec = provider.embed_text(["hello"]).first       # deterministic

Vectors are derived from SHA-256 over (model_name, input_type, input) and unit-normalized. Same input always yields the same vector; :search_query and :search_document yield different vectors for the same string, so cache-key bugs and input-type confusion in higher layers surface in tests rather than only against real providers in production.

Custom providers

Subclass Parse::Embeddings::Provider and override embed_text, dimensions, and model_name. Call instrument_embed(input_count, input_type) { ... } inside embed_text to emit the standard AS::N event (see §Telemetry below). Always call validate_response! before returning so off-by-one batches and NaN/±Inf poisoning surface as typed InvalidResponseError at the provider boundary, not deep inside a later $vectorSearch call.

Creating the Atlas vectorSearch index

find_similar requires a deployed Atlas vectorSearch index covering the target field. Create one via Parse::AtlasSearch::IndexCatalog:

Parse::AtlasSearch::IndexCatalog.create_index(
  "Document",                       # Parse class / collection name
  "body_embedding_v1",              # index name (your choice)
  {
    type: "vectorSearch",
    fields: [
      {
        type: "vector",
        path: "body_embedding",
        numDimensions: 1536,
        similarity: "cosine",
      },
      # Optional: filter fields for pre-search $match acceleration.
      { type: "filter", path: "tag" },
      { type: "filter", path: "_rperm" },
    ],
  },
)

Including _rperm as a filter field lets the per-row ACL match short-circuit at the index level — strongly recommended for any field that ACL-scoped agents will search against.

Index creation runs asynchronously. Use wait_for_ready to block until the index is queryable:

Parse::AtlasSearch::IndexCatalog.wait_for_ready(
  "Document", "body_embedding_v1", timeout: 600,
)
# => :ready | :failed | :timeout

Auto-discovery: when find_similar is called without an explicit index: kwarg, the catalog scans the collection's vectorSearch indexes for one whose definition covers the requested path. The first match wins; pass index: explicitly when you have more than one covering index and want a specific one.

Running similarity queries: `find_similar`

# Pre-computed vector
hits = Document.find_similar(vector: query_embedding, k: 10)

# Auto-embed query text using the field's declared provider
hits = Document.find_similar(text: "ruby parse stack", k: 10)

hits.first.vector_score   # => Float, Atlas vectorSearchScore
hits.first.title          # => String, normal Parse attribute

Full kwarg surface:

vector: — Array<Float> or Parse::Vector. Mutually exclusive with text:.
text: — String. Embedded with input_type: :search_query using the field's declared provider:. Capped at 256 KiB; chunk client- side before calling if larger.
k: — number of hits to return (default 10).
field: — explicit :vector property. Auto-resolves when the class has exactly one; required when multiple are declared.
filter: — post-$vectorSearch $match. Use for ordinary Parse- side filtering (e.g. { status: "published" }).
vector_filter: — Atlas-native pre-search filter. Fields must be declared type: "filter" in the index. Faster than filter: when the field is filter-indexed.
index: — explicit vectorSearch index name. Skips auto-discovery.
num_candidates: — HNSW search width hint. Higher = better recall, slower. Default ~10×k.
max_time_ms: — server-side timeout; translates to Parse::MongoDB::ExecutionTimeout on cancel.
raw: — when true, return raw BSON::Document hashes (each carries _vscore). When false (default), build Parse::Object instances.
session_token: / master: / acl_user: / acl_role: — scope kwargs forwarded to the underlying Parse::MongoDB.aggregate so the 5-layer enforcement (denylist, ACL _rperm match, CLP, protectedFields, master-key escape) runs against the result rows.

Dimension validation

find_similar compares the query vector's length to the property's declared dimensions: before sending the pipeline. A mismatch raises Parse::VectorSearch::InvalidQueryVector locally, before Atlas sees it — callers get "expected 1536, got 768" instead of a server-side error after a round-trip.

Index drift verification (v5.5)

On the first auto-discovered use of a vectorSearch index per (class, field, index) per process, the SDK compares the deployed index's latestDefinition against the model declaration:

numDimensions vs the property's declared dimensions: — a mismatch means every query will be rejected or return nonsense (usually an index that predates a model change).
similarity vs the property's declared similarity: (checked only when both sides declare one).
When the class registers an agent_tenant_scope, the scope field must appear among the index's type: "filter" paths — without it, every tenant-scoped $vectorSearch.filter fails Atlas-side at query time.

Findings are computed once per (class, field, index) per process and governed by Parse::VectorSearch.index_drift_policy:

Parse::VectorSearch.index_drift_policy = :warn   # default — [Parse::VectorSearch:DRIFT] warning on first check
Parse::VectorSearch.index_drift_policy = :raise  # IndexDriftError on EVERY query against a drifted index
Parse::VectorSearch.index_drift_policy = :ignore # skip verification

Under :raise the cached findings keep raising — strict mode means a drifted index never serves results, not "fails once, then passes". Auto-discovery verification costs no extra round-trip (the definition is already in hand from index discovery). An explicit index: kwarg is verified best-effort: when the catalog's covering index for the field carries the same name, its definition is checked too; catalog lookup failures never fail the query.

Query-embed caching and spend caps (v5.5)

Every text:-overload query funnels through one embed path (find_similar(text:), hybrid_search(text:), Parse::Retrieval.retrieve all share it), which gives two controls:

# Opt-in query-embed cache: repeated identical queries skip the
# provider round-trip. Keyed by (provider, model, dimensions,
# input_type, SHA-256(input)) — plaintext never lands in the store.
Parse::Embeddings::Cache.enable!(max_entries: 2048, ttl: 600)
Parse::Embeddings::Cache.stats   # => { enabled:, hits:, misses:, size: }

# Per-tenant spend cap now covers DIRECT callers too, not just the
# semantic_search agent tool. Tenant identity resolves to the ambient
# Parse.with_cache_tenant scope when set, else a shared default bucket.
# warn_at: adds a soft cap — crossing 80% of the limit emits a
# parse.embeddings.spend_cap_warning AS::N event (alert, never refuse).
Parse::Embeddings::SpendCap.configure(limit_tokens: 1_000_000, window: 3600,
                                      warn_at: 0.8)
Parse.with_cache_tenant("tenant_abc") do
  Document.find_similar(text: query)   # charged against tenant_abc
end

Cache hits emit the standard parse.embeddings.embed notification with cached: true, so existing spend subscribers see hits and misses on one stream. The cache is in-process by default; for a persistent layer shared across processes, wrap any Moneta-compatible backend in the bundled adapter:

moneta = Moneta.new(:Redis, url: ENV["REDIS_URL"])
Parse::Embeddings::Cache.enable!(
  store: Parse::Embeddings::Cache::MonetaStore.new(moneta, ttl: 30 * 24 * 3600),
)

MonetaStore namespaces keys, forwards TTL via Moneta's expires:, and fails open (a backend error is a cache miss, never a failed embed). Keys are input hashes — plaintext queries never land in the shared store; the VALUES are embeddings, so give the store the same access controls as the database. A query the agent tool already charged per-tenant is not double-billed (SpendCap.with_precharged wraps the tool's retrieval).

ACL/CLP inheritance

Vector search routes through Parse::MongoDB.aggregate. Every layer documented in mongodb_direct_guide.md §Security applies to vector search result rows too:

Pipeline-security denylist (always on).
Row-level ACL _rperm match — scoped agents only.
CLP read enforcement — scoped agents only.
protectedFields stripping — scoped agents only.
Master-key escape hatch.

REST /aggregate is NOT a valid path for vector search with a scoped caller. Parse Server's REST aggregate endpoint is master- key-only and would bypass every per-row ACL and CLP check. The built- in agent tools auto-promote mongo_direct: false to true for any agent carrying session_token, acl_user, acl_role, or a non- master scope so this enforcement always runs.

Managing embeddings on write: `embed` macro

The embed class macro declares which source fields feed a managed vector. The embedding is recomputed automatically on save whenever the source fields change.

class Document < Parse::Object
  property :title, :string
  property :body,  :string
  property :body_embedding, :vector, dimensions: 1536, provider: :openai

  embed :title, :body, into: :body_embedding
end

doc = Document.new(title: "hello", body: "world")
doc.save        # provider :openai called once; body_embedding populated

doc.body = "updated body"
doc.save        # provider called again; new embedding written

doc.save        # no source field changed → zero provider calls

Mechanics:

A <into>_digest :string sibling field is auto-declared (override with digest_field:). The before_save callback computes SHA-256 over the concatenated source text; if it matches the stored digest AND the target vector is non-nil, the callback returns without contacting the provider.
The target :vector property is write-protected. Direct assignment (doc.body_embedding = some_vector) raises ProtectedFieldError. The guard lifts only inside the managed write path. This prevents silent desync between the stored vector and the digest.
Source fields are concatenated with "\n\n", nil and blank values skipped. If every source is blank, the target and digest are both cleared on save.

Single vector per record

embed produces exactly one vector per record. Long source text whose concatenation exceeds the provider's per-call token budget is truncated provider-side, and the stored vector represents only the leading portion of the document. Chunking happens at retrieval time, not embed time (see Retrieval (RAG) below): the embedding stays one-vector-per-record by design.

If you instead want each passage to have its OWN embedding (true embed-time chunking), use one of these patterns:

Pre-chunk client-side and write each chunk as its own Parse::Object record with its own embed declaration.
Dedicated chunk subclass that belongs_to the parent, with embed :content, into: :embedding on the chunk class itself. Run similarity search against the chunk collection, then hydrate parents as needed.

Retrieval (RAG)

For an end-to-end runnable script — managed embed, agent_searchable, semantic_search, and an OpenAI/Anthropic generation add-in — see examples/rag_chatbot.rb.

Parse::Retrieval (Parse::RAG is an alias) sits on top of find_similar. Parse::Retrieval.retrieve embeds a natural-language query, runs Atlas $vectorSearch through find_similar (so ACL/CLP are enforced mongo-direct — there is no REST two-stage re-query), and splits each retrieved document's text field into scored, citable chunks. Chunking here is presentation-only: every chunk inherits its parent document's single $vectorSearch score.

chunks = Parse::Retrieval.retrieve(
  query:         "how do I reset my password?",
  klass:         KnowledgeArticle,   # or "KnowledgeArticle"
  field:         :embedding,         # optional; auto-resolves a single :vector field
  k:             5,
  filter:        { published: true }, # post-$vectorSearch $match
  vector_filter: nil,                 # Atlas-native pre-filter (fields must be type:"filter")
  tenant_scope:  nil,                 # { field:, value: } merged into vector_filter
  score_quantize: false,
  session_token: user.session_token,  # ACL scope kwargs pass through to find_similar
)
# => Array<Parse::Retrieval::Chunk> — { id, score, content, source, metadata }

retrieve also accepts hybrid: (fuse a lexical branch with the vector branch — see Hybrid search below) and rerank: (reorder retrieved documents with a cross-encoder before chunking — see Reranking). Both were reserved in earlier releases and now ship in 5.4.0.

Pointer values in filters translate automatically (v5.5). A filter like { owner: some_user } (a Parse::Pointer / Parse::Object, or a wire-form {"__type" => "Pointer", ...} hash — including inside $in / $eq / $ne operator hashes) is rewritten to its MongoDB storage form { "_p_owner" => "_User$abc123" } before the $match / $vectorSearch.filter is built, so pointer filters match rows instead of silently matching nothing. Translation runs after the underscore-key gate (callers still cannot name _p_* columns directly) and before the tenant-scope fold; the semantic_search agent tool inherits it. For vector_filter: use, the pointer column (_p_owner) must be declared type: "filter" in the index.

Hybrid search (vector + lexical)

Class.hybrid_search runs a lexical Atlas Search ($search) branch and a $vectorSearch branch as two independent aggregations, then fuses their ranked results with reciprocal-rank fusion (RRF). Two aggregations (not a single $facet) is mandatory: $vectorSearch is prohibited inside $facet / $lookup / $unionWith and must be stage 0 of its pipeline. Each branch enforces ACL/CLP/protectedFields independently before fusion (via Parse::AtlasSearch.search and Parse::VectorSearch.search), so the fused rows are already access-filtered — there is no separate hydration fetch.

hits = Article.hybrid_search(
  text:    "how do I reset my password",   # embedded for the vector branch;
                                            # also the default lexical query
  lexical: { index: "article_search", fields: %w[title body] },
  vector:  { index: "article_embedding_idx", num_candidates: 200 },
  k:       20,
  fusion:  { k_constant: 60, weights: { lexical: 0.4, vector: 0.6 } },
  session_token: user.session_token,        # ACL scope, applied to BOTH branches
)
# => Array<Parse::Object>; each carries #hybrid_score, #hybrid_ranks,
#    and #vector_score / #search_score when that branch contributed.

RRF math. fused_score(d) = Σ_b weight_b / (k_constant + rank_b(d)), where rank_b(d) is the document's 1-based rank in branch b. A larger k_constant (default 60) flattens the contribution curve. weights defaults to 1.0 per branch. Parse::VectorSearch::Hybrid.rrf exposes the pure fusion if you want to fuse pre-fetched ranked lists yourself.

Native $rankFusion (Atlas 8.0+). Parse::VectorSearch::Hybrid.rank_fusion_supported?(collection) detects the native server-side fusion stage via a cached behavioural probe (1-hour TTL — not version-string parsing). Native execution is opt-in (fusion: { method: :rrf_native }) and falls back to the client-side path when the cluster does not support it; the default :rrf always fuses client-side, which is the fully-enforced, deterministic path. $rankFusion is admitted to PipelineSecurity::ALLOWED_STAGES for the native path.

Parse::Retrieval.retrieve(hybrid: true, ...) routes through hybrid_search and chunks the fused results; pass hybrid: { lexical:, vector:, fusion: } to configure the branches. Tenant scope is folded into both branches (the vector Atlas pre-filter and the lexical post-$search $match) so neither leaks cross-tenant document existence.

Reranking

A reranker reorders retrieved documents by a cross-encoder relevance score before chunking. Pass any object answering #rerank(query:, documents:, top_n:) — typically a Parse::Retrieval::Reranker::Base subclass:

reranker = Parse::Retrieval::Reranker::Cohere.new(
  api_key: ENV.fetch("COHERE_API_KEY"), model: "rerank-v3.5",
)
chunks = Parse::Retrieval.retrieve(
  query: "reset my password", klass: Article, k: 30,
  rerank: reranker, rerank_top_n: 5,    # keep the 5 most relevant docs
)
# Reranked chunks' score is the cross-encoder relevance_score.

Reranker::Fixture is a deterministic, zero-network reranker (lexical token overlap) for tests. The Reranker::Base protocol validates inputs, bounds top_n, rejects out-of-range indices, and sorts descending — adapters implement only the network call (#rerank_scores).

Spend cap. The semantic_search agent tool charges the estimated query-embedding tokens against the caller's tenant budget via Parse::Embeddings::SpendCap (opt-in; configure(limit_tokens:, window:)). A breach hard-refuses (surfaced to the agent as a rate-limited tool error). Admin agents are exempt; direct find_similar / retrieve callers are not metered.

Chunkers

The default is a fixed-size sliding window with overlap. Subclass Parse::Retrieval::Chunker::Base (implement #chunk(text) -> Array<String>) for semantic / sentence-aware strategies.

Parse::Retrieval::Chunker::FixedSizeOverlap.new(
  size: 800,                    # window width
  overlap: 100,                 # units shared between consecutive windows (must be < size)
  by: :chars,                   # :chars (default) or :tokens (whitespace tokens)
  max_chunks_per_document: 200, # amplification cap — TRUNCATES with a signal, never raises
)

`agent_searchable` + the `semantic_search` agent tool

Opt a model in to agentic retrieval, declaring the vector field and the fields an agent may filter on:

class KnowledgeArticle < Parse::Object
  property :title, :string
  property :body, :string
  property :embedding, :vector, dimensions: 1536, provider: :openai
  embed :title, :body, into: :embedding
  agent_searchable field: :embedding, filter_fields: %i[published category]
end

Every property referenced by embed must be declared — omitting property :title here raises InvalidEmbedDeclaration at class load.

Because this model embeds two text sources (:title and :body), semantic_search cannot guess which one to chunk and return as the result content. Pass text_field: to choose (it must name one of the embedded sources); a single-source model infers it automatically and the parameter is optional:

# via the agent tool (LLM-facing parameter)
semantic_search(class_name: "KnowledgeArticle", query: "vector indexes",
                text_field: "body")

# or directly
Parse::Retrieval.retrieve(query: "vector indexes", klass: KnowledgeArticle,
                          text_field: :body)

The readonly, client_safe semantic_search tool then routes through Parse::Retrieval.retrieve with the full agent security envelope: searchable-class allowlist (MetadataRegistry.resolve_searchable!), recursive underscore-key refusal + filter-field allowlist on caller input, tenant scope folded into the Atlas pre-filter AND re-asserted on every returned record, field_allowlist projection of each source, and score quantization in non-admin contexts. In a tenant-aware deployment (any class declares agent_tenant_scope), a searchable class without its own tenant scope is refused at dispatch. See the MCP guide for the agent-side wiring.

Result shape (token-economy). The tool returns { chunks:, documents:, count: }. Each chunk's parent record is hoisted once into documents (keyed by objectId) rather than duplicated on every chunk — map a chunk to its source via metadata.object_id. A max_total_tokens: budget (default 20,000; estimated chars/4) trims the lowest-ranked chunks so a few long documents can't silently blow the context window, adding budget_truncated: true / budget_dropped: <n> when it trims (pass 0 to disable). The library-level Parse::Retrieval.retrieve still returns the flat Array<Chunk> with source on each chunk — the dedup and budget live in the agent tool's envelope. See the MCP guide's Token Economy section.

Image embedding: `embed_image` macro (v5.1 URL mode, v5.5 bytes mode)

embed_image is the image-source counterpart to embed. The source property must be :file-typed; the target must be a :vector property whose declared provider: supports multimodal input (currently :voyage with voyage-multimodal-3, or :cohere with embed-v4.0).

Two fetch modes, selected per declaration with source::

source: :url (default) — the SDK validates the file's URL and forwards it; the provider performs the fetch from its own network. Requires the trust_provider_url_fetch sentinel (see operator setup below).
source: :bytes (v5.5) — the SDK downloads the image through Parse::File.safe_open_url, verifies the content by magic-byte sniff, strips EXIF/XMP metadata, and forwards the bytes to the provider as a base64 data URI. No provider-side URL fetch occurs, so the sentinel is NOT required — the allowed_image_hosts allowlist still is.

class Post < Parse::Object
  property :cover_image,            :file
  property :cover_image_embedding,  :vector,
           dimensions: 1024,
           provider:   :voyage,
           model:      "voyage-multimodal-3"

  embed_image :cover_image, into: :cover_image_embedding
end

Operator setup (required before any save)

Image embedding hands an attacker-influenced URL (a user-uploaded Parse::File, a chat message, an agent tool argument) to a third-party provider that will issue an HTTP request from its own network. The provider's fetch happens after SDK-side validation, so DNS rebinding and redirect-following are residual risks the SDK cannot eliminate.

The setup must happen in this exact order — skipping (1) or (2) raises a typed error at save time with a message naming the missing prerequisite:

# (1) Declare which CDNs the validator will accept. Empty allowlist
# denies every host — opposite of Parse::File.allowed_remote_hosts.
Parse::Embeddings.allowed_image_hosts = [
  ".cloudfront.net",                # suffix match (leading ".")
  "files.example.com",              # exact match
]

# (2) Sentinel-gated opt-in. Only the exact frozen String unlocks;
# `true`, `"true"`, `1`, or any other value raises
# Parse::Embeddings::ConfirmationRequired.
Parse::Embeddings.trust_provider_url_fetch = "PROVIDER_EGRESS_VERIFIED"

# (3) Declare embed_image on the model.
class Post < Parse::Object
  embed_image :cover_image, into: :cover_image_embedding
end

URL validator (`Parse::Embeddings.validate_image_url!`)

Every embed_image save path routes through Parse::Embeddings.validate_image_url!(url, allow_insecure:), which runs layered cheap-first checks: sentinel set, https:// (or http:// with allow_insecure: true), no userinfo, host not an obfuscated-IP form (0x7f.0.0.1, 127.1, 2130706433), host in the allowlist, port in Parse::File.allowed_remote_ports, host resolves only to public addresses (delegated to Parse::File.assert_host_allowed! so the SSRF mechanism is shared with Parse::File, not parallelized). Failures raise Parse::Embeddings::InvalidImageURL with a :reason Symbol (:scheme, :port, :userinfo, :host_blocked, :host_not_allowlisted, :parse).

Bytes mode (`source: :bytes`, v5.5)

# Operator setup — only the host allowlist is required (the sentinel
# applies to URL forwarding, not SDK-side fetches):
Parse::Embeddings.allowed_image_hosts = [".cloudfront.net"]

class Post < Parse::Object
  property :cover_image,           :file
  property :cover_image_embedding, :vector,
           dimensions: 1024, provider: :voyage, model: "voyage-multimodal-3"

  embed_image :cover_image, into: :cover_image_embedding,
              source: :bytes            # exif_strip: true is the default
end

What happens on each (digest-miss) save:

The file URL is validated through Parse::Embeddings.validate_image_url!(url, mode: :fetch) — the same host allowlist (deny-all when empty), obfuscated-IP screen, port allowlist, and CIDR resolution check as URL mode, minus the provider-egress sentinel.
Parse::File.safe_open_url downloads the bytes — CIDR blocks, DNS-rebinding re-check, port allowlist, max_remote_size cap, timeouts. No parallel fetch mechanism exists.
Magic-byte verification (Parse::Embeddings::ImageFetch): the MIME type is determined exclusively from the leading bytes (JPEG / PNG / GIF / WebP). The HTTP Content-Type header is never consulted. The sniffed type must be in Parse::Embeddings.allowed_image_types (default those four; SVG is deliberately excluded as script-capable active content), and when the URL carries a recognized image extension, the extension must AGREE with the magic bytes — a .jpg URL serving PNG bytes (or HTML) is refused as MIME laundering (ImageFetch::InvalidImageType, with a :reason tag).
EXIF/XMP stripping, default ON. JPEG APP1 segments (Exif and XMP), PNG eXIf chunks, and WebP EXIF/XMP RIFF chunks (with the VP8X flag bits cleared) are removed before the bytes leave the process — user photos commonly carry GPS coordinates and device serials. Opt out per declaration with exif_strip: false when orientation metadata must survive.
The verified bytes ride to the provider as a base64 data URI (Voyage image_base64 content row; Cohere image_url data-URI form).

Direct provider calls accept the same shape: provider.embed_image([Parse::Embeddings::ImageFetch.fetch!(url)]) — FetchedImage sources and URL Strings may be mixed in one batch.

Save-side semantics

Digest is the SHA-256 of the URL String, not the file bytes. Replacing the Parse::File with one pointing at a different URL re-embeds; resaving the same URL is a no-op (zero provider calls). Parse-managed file URLs are stable unless overwritten in place — if you PUT-replace bytes at the same URL (S3 without renaming), null the digest field to force re-embed.
The same EmbedManaged write-guard applies: direct assignment to the managed vector raises ProtectedFieldError. The write path is the only way to populate the target vector.
embed and embed_image can co-declare on the same record (different source properties → different :vector targets), so a record can have one text-embedding column and one image-embedding column queried by separate Atlas vectorSearch indexes.

Re-embedding existing rows

Provenance: the `<into>_meta` sibling (v5.5)

Every embed / embed_image declaration auto-declares an <into>_meta :object sibling (override with meta_field:) stamped on each recompute and cleared with the vector:

doc.body_embedding_meta
# => { "provider"    => "openai",
#      "model"       => "text-embedding-3-small",
#      "dimensions"  => 1536,
#      "modality"    => "text",
#      "embedded_at" => "2026-06-09T17:32:11Z" }

This is the record migration tooling reads to know which model produced any stored vector.

Same-shape migrations: `Class.reembed!` (v5.5)

When the new model has the same dimensions (e.g. swapping text-embedding-3-small for a same-width replacement, or a provider change at equal width), re-embed in place:

# Re-embed every row through the CURRENT provider/model declaration.
Document.reembed!(batch_size: 100)

# Resumable: skip rows whose <into>_meta already matches the current
# provider + model + dimensions (rows with no meta count as stale).
Document.reembed!(only_stale: true)

# Scope it
Document.reembed!(field: :body_embedding, where: { published: true }, limit: 10_000)

reembed! walks the class with objectId-cursor pagination, clears each row's digest sibling (so the save-path recompute cannot elide the provider call), and saves. Unlike embed_pending! — which only fills NULL vectors — reembed! recomputes populated rows too. Run it with a master-key client (or pass save_opts: with a session token that can write every row). Each row's save makes one provider call; pace bulk runs against provider rate limits (see BatchEmbedder below for the pattern, or just throttle the loop).

Changed-width migrations: dual-field workflow

Changing dimensions: is a different beast — the existing vectorSearch index can't serve the new width. Use the shadow-field workflow:

Add the new property alongside the old one (property :body_embedding_v2, :vector, ...) and an embed or embed_image block targeting it.
Backfill with embed_pending!(field: :body_embedding_v2) — the new field is null everywhere, so the null-filling walk is exactly right.
Deploy a new vectorSearch index covering the new field and migrate find_similar callers.
Drop the old property and index.

Do NOT mutate a model's dimensions: in place — the digest mechanism will see unchanged source text and skip recompute, leaving stale vectors, and the drift verifier will flag every query against the old index (index numDimensions=1536 but property declares ...). For embed_image, also remember the digest is over the URL String: if you replace bytes at the same URL (PUT-replace on S3 without renaming), null the digest field — or run reembed! — to force re-embed.

Bulk embedding: `BatchEmbedder` (v5.5)

Provider#embed_text_batched only slices input into provider-sized chunks; retry lives inside each provider's single HTTP call. For bulk jobs (ingest pipelines, chunk-corpus embedding) use Parse::Embeddings::BatchEmbedder, which adds batch-level pacing and backoff:

embedder = Parse::Embeddings::BatchEmbedder.new(
  Parse::Embeddings.provider(:openai),
  requests_per_minute: 60,        # inter-batch pacing
  max_attempts: 5,                # per-batch tries (exponential backoff + jitter)
  on_progress: ->(done:, total:, batch_index:, batch_count:) {
    puts "#{done}/#{total}"
  },
)
vectors = embedder.embed_text(texts, input_type: :search_document)

Rate-limit and transient errors (any provider error class ending in RateLimitError / TransientError; override with retry_on:) retry with exponential backoff; other errors propagate immediately. A batch that exhausts its attempts raises BatchEmbedder::BatchFailed carrying batch_index and completed_count, so a resumable job knows exactly where to pick up.

Telemetry: `parse.embeddings.embed` AS::N

Every provider emits parse.embeddings.embed via ActiveSupport::Notifications.instrument. Subscribe to track cost, latency, and error rate across all embedding spend:

ActiveSupport::Notifications.subscribe("parse.embeddings.embed") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  StatsD.increment(
    "parse.embeddings.embed",
    tags: [
      "provider:#{event.payload[:provider]}",
      "model:#{event.payload[:model]}",
      "input_type:#{event.payload[:input_type]}",
      "error:#{event.payload[:error] || 'none'}",
    ],
  )
  StatsD.histogram("parse.embeddings.tokens", event.payload[:total_tokens]) if event.payload[:total_tokens]
  StatsD.timing("parse.embeddings.duration_ms", event.duration)
end

Payload contract (keys always present; values may be nil):

Key	Type	Notes
`:provider`	`String`	`provider.class.name` (e.g. `"Parse::Embeddings::OpenAI"`)
`:model`	`String`	`provider.model_name`
`:dimensions`	`Integer`	`provider.dimensions`
`:input_count`	`Integer`	batch size
`:input_type`	`Symbol`	`:search_query` / `:search_document`
`:total_tokens`	`Integer`/nil	provider-reported usage; nil for Fixture and providers without usage
`:cached`	`Boolean`	always false in v5.0; reserved for v5.1 embed cache
`:error`	`String`/nil	`exception.class.name` when the block raised — class name only

Notes:

:error is the class name, never the message. Provider exceptions can contain user-supplied text from the API; surfacing only the class name keeps PII out of operator dashboards.
Pre-validation failures (embed_text called with non-Array, or with non-String elements) do not emit an event. The validation runs before the instrument block so caller-shape errors aren't recorded as embed attempts.
Subscribers run synchronously on the request thread. A slow subscriber blocks every embed call. Push to non-blocking sinks (StatsD-over-UDP, batched OTel exporters) rather than doing filesystem or HTTP I/O inside the subscriber.

Logging and PII considerations

When find_similar(text:) is called, the query text is sent over the wire to the embedding provider. Operators with global Faraday request logging enabled on the embedding connection will capture the full query text in the JSON request body. Treat text: as user-visible content for log-handling purposes; redact at the Faraday middleware layer if your logging pipeline retains payloads.

The vector itself never appears in OpenAI request bodies (text in, floats out). Vectors only flow through the Parse↔Mongo path, where the body builder's <vector dims=N> compaction prevents them from landing in stdout / error trackers.

When the embedded source is PII: deployment checklist

An embedding of PII is PII-equivalent. Inversion attacks reconstruct substantial source text from dense embeddings, and a vector's nearest neighbors leak the source's meaning even without reconstruction. If the fields you embed contain personal data (names, addresses, health or financial details, free-text user messages), treat the vector column with the same handling as the source column:

Provider contract. You are sending the raw source text (and in bytes mode, image content) to the embedding provider on every recompute. Confirm the provider's data-retention and training-use terms cover PII, and that a DPA is in place where required. Self-hosting via LocalHTTP (Ollama / vLLM / TEI) keeps the text in your network.
Keep vectors off the wire. Leave vector_visibility at its :owner_only default so vectors are omitted from as_json and webhook payloads. Do not flip a PII class to :public.
Row ACL still governs. Vector hits route mongo-direct with _rperm enforcement — verify your rows carry real ACLs and that callers use scoped credentials (session_token: / acl_user:), not blanket master key.
Tenant isolation. Multi-tenant deployments must declare agent_tenant_scope on searchable classes; the scope folds into $vectorSearch.filter (and v5.5's drift verification confirms the index covers it). Without it, similarity scores leak cross-tenant document existence.
Score exposure. Keep score quantization on for non-admin agent contexts (the default) — full-precision scores enable membership-inference probing.
EXIF stays stripped. For image embedding, keep the bytes-mode default exif_strip: true; user photos carry GPS coordinates and device serials that would otherwise reach the provider.
Log and cache hygiene. Redact query text at the Faraday layer (above); if you enable the persistent L2 cache, note that cache KEYS are hashes (no plaintext) but cache VALUES are the embeddings themselves — point MonetaStore at a store with the same access controls as the database.
Deletion propagation. When a user exercises erasure rights, the vector, its <field>_digest, and its <field>_meta siblings live on the same row and delete with it — but check external copies: provider-side logs (their retention policy), your L2 embedding cache (TTL or explicit flush), and any analytics sink subscribed to embedding events.
Migration hygiene. reembed! re-sends every row's source text to the provider — schedule PII-class migrations under the same approvals as a data export.

Troubleshooting

NoVectorProperty: no :vector property declared on this class The class has no field declared as :vector. Add one.

AmbiguousVectorField: class declares multiple :vector properties Pass field: :which_one to disambiguate.

IndexNotResolved: no vectorSearch index found covering Class.field Create the index (see §Creating the Atlas vectorSearch index) or pass index: explicitly.

InvalidQueryVector: expected 1536, got 768 The query vector's length doesn't match the declared dimensions:. Almost always means the query embedding came from a different model than the stored embeddings.

EmbedderNotConfigured The :vector property has no provider: declared but find_similar was called with text:. Either declare a provider on the property, or pass an explicit vector: Array.

ProtectedFieldError: <Class>#<field> is managed by 'embed' User code tried to assign directly to a managed vector field. Update the declared source fields instead and save.

InvalidResponseError: response length 5 != input count 4 The provider returned a different number of vectors than inputs. The provider has a bug — the validation in Parse::Embeddings::Provider#validate_response! caught it before the misaligned vectors could be stored.

Atlas Local: index stays BUILDING forever Atlas Local's internal supervisor periodically restarts mongod during replica-set sync. Use IndexCatalog.wait_for_ready (which bypasses the IndexManager's 300-second cache via force_refresh: true on every poll) rather than a until index_ready?; sleep loop.

Reference

Key files:

lib/parse/embeddings.rb — registry, Configuration, register, provider, configure, validate_image_url! (mode: :forward | :fetch), trust_provider_url_fetch=, allowed_image_hosts=, allowed_image_types=.
lib/parse/embeddings/provider.rb — abstract base, validate_response!, instrument_embed, AS::N payload contract.
lib/parse/embeddings/image_fetch.rb — bytes-fetch path: ImageFetch.fetch!, magic-byte sniff_mime/verify!, EXIF/XMP stripping, FetchedImage.
lib/parse/embeddings/batch_embedder.rb — BatchEmbedder bulk orchestration (pacing, batch-level backoff, BatchFailed).
lib/parse/embeddings/cache.rb — opt-in query-embed cache (Cache.enable! / fetch_vector / stats).
lib/parse/embeddings/spend_cap.rb — per-tenant token cap (charge!, charge_query!, with_precharged).
lib/parse/embeddings/openai.rb — OpenAI provider.
lib/parse/embeddings/cohere.rb — Cohere v3 + v4.0 text-mode provider.
lib/parse/embeddings/voyage.rb — Voyage text + multimodal-3 text-mode provider.
lib/parse/embeddings/jina.rb — Jina v3 / v4 / v5 / code provider.
lib/parse/embeddings/qwen.rb — Qwen3-Embedding via DashScope.
lib/parse/embeddings/local_http.rb — generic OpenAI-compatible local-gateway client.
lib/parse/embeddings/fixture.rb — deterministic test provider.
lib/parse/model/core/vector_searchable.rb — find_similar, hybrid_search, index drift verification (Parse::VectorSearch.index_drift_policy).
lib/parse/model/core/embed_managed.rb — embed and embed_image macros, EmbedDirective (carries modality:, allow_insecure:, source_mode:, exif_strip:, meta_field:), embed_pending!, reembed!.
lib/parse/vector_search.rb — low-level Parse::VectorSearch.search.
lib/parse/atlas_search/index_manager.rb — IndexCatalog.create_index, find_vector_index, wait_for_ready.
lib/parse/mongodb.rb — direct MongoDB access, 5-layer enforcement.