Module: Parse::Core::EmbedManaged::ClassMethods

Defined in:
lib/parse/model/core/embed_managed.rb

Instance Method Summary collapse

Instance Method Details

#embed(*source_fields, into:, input_type: :search_document, digest_field: nil, meta_field: nil) ⇒ Symbol

Declare a managed embedding. See Parse::Core::EmbedManaged for the full description.

Parameters:

  • source_fields (Array<Symbol>)

    one or more scalar property names whose values are concatenated (joined with "\n\n", nil skipped) to form the embed input.

  • into (Symbol)

    the :vector property to populate. Must already be declared with provider: metadata.

  • input_type (Symbol) (defaults to: :search_document)

    forwarded to Embeddings::Provider#embed_text. Defaults to :search_document (the write-side counterpart to find_similar(text:)'s :search_query).

  • digest_field (Symbol, nil) (defaults to: nil)

    override for the digest sibling property. Defaults to :"#{into}_digest". Auto- declared as :string if not already declared.

  • meta_field (Symbol, nil) (defaults to: nil)

    override for the provenance sibling property. Defaults to :"#{into}_meta". Auto- declared as :object if not already declared; populated with { provider:, model:, dimensions:, modality:, embedded_at: } on every recompute. Read by #reembed! to skip rows already embedded by the current provider/model.

Returns:

  • (Symbol)

    the target vector field name.

Raises:



213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
# File 'lib/parse/model/core/embed_managed.rb', line 213

def embed(*source_fields, into:, input_type: :search_document, digest_field: nil,
          meta_field: nil)
  if source_fields.empty?
    raise InvalidEmbedDeclaration,
          "#{self}.embed: at least one source field is required."
  end
  into = into.to_sym
  unless vector_properties.key?(into)
    raise InvalidEmbedDeclaration,
          "#{self}.embed: `into: :#{into}` is not a declared :vector property " \
          "(declared :vector fields: #{vector_properties.keys.inspect})."
  end
  provider_name = vector_properties.dig(into, :provider)
  if provider_name.nil?
    raise InvalidEmbedDeclaration,
          "#{self}.embed: `into: :#{into}` has no `provider:` declared on its :vector " \
          "property. Add `provider: :openai` (or another registered name) to the " \
          "property declaration."
  end
  sources = source_fields.map(&:to_sym)
  missing = sources.reject { |f| fields.key?(f) }
  unless missing.empty?
    raise InvalidEmbedDeclaration,
          "#{self}.embed: source fields #{missing.inspect} are not declared on this class."
  end

  digest_field = (digest_field || :"#{into}_digest").to_sym
  unless fields.key?(digest_field)
    property digest_field, :string
  end
  meta_field = (meta_field || :"#{into}_meta").to_sym
  unless fields.key?(meta_field)
    property meta_field, :object
  end

  directive = EmbedDirective.new(
    sources: sources,
    into: into,
    digest_field: digest_field,
    input_type: input_type,
    provider_name: provider_name,
    meta_field: meta_field,
  ).freeze
  embed_directives[into] = directive

  callback_method = :"_auto_embed_#{into}!"
  define_method(callback_method) do
    Parse::Core::EmbedManaged.recompute_embedding!(self, directive)
  end

  already_registered = _save_callbacks.any? do |cb|
    cb.kind == :before && (cb.filter.to_sym rescue cb.filter) == callback_method
  end
  before_save callback_method unless already_registered

  install_embed_writer_guard!(into, sources)

  into
end

#embed_directivesObject

Per-class registry of EmbedDirectives keyed by target vector property symbol. Read by tests and tooling; written only by #embed.



185
186
187
# File 'lib/parse/model/core/embed_managed.rb', line 185

def embed_directives
  @embed_directives ||= {}
end

#embed_image(source_field, into:, input_type: :search_document, digest_field: nil, allow_insecure: false, source: :url, exif_strip: true, meta_field: nil) ⇒ Symbol

Declare a managed image embedding. Mirrors #embed but the source field is a :file property (Parse::File) and the provider call routes through Embeddings::Provider#embed_image rather than #embed_text. Two fetch modes (source:):

  • :url (default, v5.1 behavior) — the SDK extracts the file's URL, validates it through Embeddings.validate_image_url! (sentinel-gated egress opt-in, CIDR / port / host allowlist), and forwards the canonicalized URL to the provider, which performs its own fetch. The SDK does NOT download image bytes.
  • :bytes (v5.5) — the SDK downloads the image itself via File.safe_open_url (through Embeddings::ImageFetch.fetch!), verifies the content by magic-byte sniff against Embeddings.allowed_image_types (the Content-Type header is never trusted), strips EXIF/XMP metadata by default, and forwards the bytes to the provider as a base64 data URI. Does NOT require the trust_provider_url_fetch sentinel (no third-party URL egress), but the file's host must still be in Embeddings.allowed_image_hosts.

Digest is the URL string, not the file contents. Replacing the Parse::File with one pointing to a different URL re-embeds; re-saving the same URL is a no-op (zero provider calls). Cloud-stored Parse files have stable URLs unless overwritten, so this is the right cache key for most uploads. If you mutate the underlying bytes at the SAME URL (e.g. PUT-replace on S3 without renaming), the embedding will NOT refresh; rename the file or set :#{into}_digest to nil and resave to force re-embed.

Parameters:

  • source_field (Symbol)

    one :file property whose URL feeds the provider. (v5.1 accepts a single source per directive; multi-image-per-record support is deferred.)

  • into (Symbol)

    the :vector property to populate. Must already be declared with provider: metadata.

  • input_type (Symbol) (defaults to: :search_document)

    forwarded to Provider#embed_image. Defaults to :search_document.

  • digest_field (Symbol, nil) (defaults to: nil)

    override for the URL-digest sibling. Defaults to :"#{into}_digest". Auto-declared as :string if not already declared.

  • allow_insecure (Boolean) (defaults to: false)

    forwarded to Embeddings.validate_image_url!; permit http:// for local-dev CDN proxies. Default false.

  • source (Symbol) (defaults to: :url)

    :url (provider fetches; default) or :bytes (SDK fetches, verifies, strips, forwards base64).

  • exif_strip (Boolean) (defaults to: true)

    strip EXIF/XMP metadata before forwarding bytes (default true; :bytes mode only — ignored for :url, where the SDK never sees the bytes).

  • meta_field (Symbol, nil) (defaults to: nil)

    override for the provenance sibling property. Defaults to :"#{into}_meta"; see #embed.

Returns:

  • (Symbol)

    the target vector field name.

Raises:



327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
# File 'lib/parse/model/core/embed_managed.rb', line 327

def embed_image(source_field, into:, input_type: :search_document,
                digest_field: nil, allow_insecure: false,
                source: :url, exif_strip: true, meta_field: nil)
  # Capture the fetch mode immediately — the legacy local
  # `source = source_field.to_sym` below shadows the kwarg.
  source_mode = source_mode_for_embed_image!(source)
  into = into.to_sym
  unless vector_properties.key?(into)
    raise InvalidEmbedDeclaration,
          "#{self}.embed_image: `into: :#{into}` is not a declared :vector property " \
          "(declared :vector fields: #{vector_properties.keys.inspect})."
  end
  provider_name = vector_properties.dig(into, :provider)
  if provider_name.nil?
    raise InvalidEmbedDeclaration,
          "#{self}.embed_image: `into: :#{into}` has no `provider:` declared on its " \
          ":vector property. Add `provider: :voyage` (or another registered name) " \
          "to the property declaration."
  end

  source = source_field.to_sym
  unless fields.key?(source)
    raise InvalidEmbedDeclaration,
          "#{self}.embed_image: source field #{source.inspect} is not declared on this class."
  end
  unless fields[source] == :file
    raise InvalidEmbedDeclaration,
          "#{self}.embed_image: source field #{source.inspect} must be a :file property " \
          "(got #{fields[source].inspect}). v5.1 image embedding accepts Parse::File " \
          "sources only — text sources go through `embed`."
  end

  digest_field = (digest_field || :"#{into}_digest").to_sym
  unless fields.key?(digest_field)
    property digest_field, :string
  end
  meta_field = (meta_field || :"#{into}_meta").to_sym
  unless fields.key?(meta_field)
    property meta_field, :object
  end

  directive = EmbedDirective.new(
    sources: [source],
    into: into,
    digest_field: digest_field,
    input_type: input_type,
    provider_name: provider_name,
    modality: :image,
    allow_insecure: allow_insecure,
    source_mode: source_mode,
    exif_strip: exif_strip ? true : false,
    meta_field: meta_field,
  ).freeze
  embed_directives[into] = directive

  callback_method = :"_auto_embed_#{into}!"
  define_method(callback_method) do
    Parse::Core::EmbedManaged.recompute_embedding!(self, directive)
  end

  already_registered = _save_callbacks.any? do |cb|
    cb.kind == :before && (cb.filter.to_sym rescue cb.filter) == callback_method
  end
  before_save callback_method unless already_registered

  install_embed_writer_guard!(into, [source])

  into
end

#embed_pending!(field: nil, batch_size: 100, limit: nil, where: nil, save_opts: {}) ⇒ Integer

Backfill embeddings for records whose managed vector field is still null — the bulk counterpart to the per-save embed path. Walks the class with objectId-cursor pagination (robust to the result set shrinking as records are embedded; terminates even when a record has no source text and stays null), saving each pending record so its before_save embed callback runs.

Intended as an admin / maintenance operation: it reads and writes through the default client, so run it with a master-key client (or pass save_opts: carrying a session_token: that can write every row).

Parameters:

  • field (Symbol, nil) (defaults to: nil)

    limit the backfill to one embed target; nil processes every declared directive.

  • batch_size (Integer) (defaults to: 100)

    rows fetched per round (default 100).

  • limit (Integer, nil) (defaults to: nil)

    stop after embedding at most this many records across all directives; nil = no cap.

  • where (Hash, nil) (defaults to: nil)

    extra query constraints AND-ed with the null-target filter (e.g. { published: true }).

  • save_opts (Hash) (defaults to: {})

    options forwarded to each record.save (e.g. session_token:).

Returns:

  • (Integer)

    number of records saved (embedded).

Raises:

  • (ArgumentError)

    when field: names no embed target, or the class declares no embed directives.



541
542
543
544
545
546
547
548
549
550
551
552
553
# File 'lib/parse/model/core/embed_managed.rb', line 541

def embed_pending!(field: nil, batch_size: 100, limit: nil, where: nil, save_opts: {})
  bs = Integer(batch_size)
  raise ArgumentError, "#{self}.embed_pending!: batch_size must be positive." if bs <= 0
  directives = resolve_embed_directives_for_backfill(field)

  processed = 0
  directives.each do |directive|
    remaining = limit ? (limit - processed) : nil
    break if remaining && remaining <= 0
    processed += backfill_embed_directive!(directive, bs, where, remaining, save_opts)
  end
  processed
end

#reembed!(field: nil, batch_size: 100, limit: nil, where: nil, only_stale: false, save_opts: {}) ⇒ Integer

Re-embed records through the CURRENT provider/model — the bulk migration counterpart to #embed_pending! (which only fills null vectors). Use after changing a :vector property's provider: / model: / dimensions: declaration: walks the class with objectId-cursor pagination, clears each record's digest sibling so the before_save recompute cannot elide the provider call, and saves.

With only_stale: true, rows whose <into>_meta provenance already matches the current provider name, model, and declared dimensions are skipped without a provider call — making the operation resumable: re-running after a partial failure only touches rows still carrying old-model vectors. Rows with no meta record (embedded before v5.5) always count as stale.

Intended as an admin / maintenance operation: run it with a master-key client (or pass save_opts: carrying a session_token: that can write every row). Combine with Embeddings::BatchEmbedder-style pacing externally if the provider rate-limits — each record's save makes one provider call.

Parameters:

  • field (Symbol, nil) (defaults to: nil)

    limit to one embed target; nil processes every declared directive.

  • batch_size (Integer) (defaults to: 100)

    rows fetched per round (default 100).

  • limit (Integer, nil) (defaults to: nil)

    stop after re-embedding at most this many records across all directives; nil = no cap.

  • where (Hash, nil) (defaults to: nil)

    extra query constraints (e.g. { published: true }).

  • only_stale (Boolean) (defaults to: false)

    skip rows whose meta provenance matches the current provider/model/dimensions (default false — re-embed everything).

  • save_opts (Hash) (defaults to: {})

    options forwarded to each record.save.

Returns:

  • (Integer)

    number of records re-embedded (saved).

Raises:

  • (ArgumentError)

    when field: names no embed target, or the class declares no embed directives.



444
445
446
447
448
449
450
451
452
453
454
455
456
457
# File 'lib/parse/model/core/embed_managed.rb', line 444

def reembed!(field: nil, batch_size: 100, limit: nil, where: nil,
             only_stale: false, save_opts: {})
  bs = Integer(batch_size)
  raise ArgumentError, "#{self}.reembed!: batch_size must be positive." if bs <= 0
  directives = resolve_embed_directives_for_backfill(field, caller_label: "reembed!")

  processed = 0
  directives.each do |directive|
    remaining = limit ? (limit - processed) : nil
    break if remaining && remaining <= 0
    processed += reembed_directive!(directive, bs, where, remaining, only_stale, save_opts)
  end
  processed
end