Module: Parse::Core::EmbedManaged::ClassMethods

Defined in:
lib/parse/model/core/embed_managed.rb

Instance Method Summary collapse

Instance Method Details

#embed(*source_fields, into:, input_type: :search_document, digest_field: nil) ⇒ Symbol

Declare a managed embedding. See Parse::Core::EmbedManaged for the full description.

Parameters:

  • source_fields (Array<Symbol>)

    one or more scalar property names whose values are concatenated (joined with "\n\n", nil skipped) to form the embed input.

  • into (Symbol)

    the :vector property to populate. Must already be declared with provider: metadata.

  • input_type (Symbol) (defaults to: :search_document)

    forwarded to Embeddings::Provider#embed_text. Defaults to :search_document (the write-side counterpart to find_similar(text:)'s :search_query).

  • digest_field (Symbol, nil) (defaults to: nil)

    override for the digest sibling property. Defaults to :"#{into}_digest". Auto- declared as :string if not already declared.

Returns:

  • (Symbol)

    the target vector field name.

Raises:



189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
# File 'lib/parse/model/core/embed_managed.rb', line 189

def embed(*source_fields, into:, input_type: :search_document, digest_field: nil)
  if source_fields.empty?
    raise InvalidEmbedDeclaration,
          "#{self}.embed: at least one source field is required."
  end
  into = into.to_sym
  unless vector_properties.key?(into)
    raise InvalidEmbedDeclaration,
          "#{self}.embed: `into: :#{into}` is not a declared :vector property " \
          "(declared :vector fields: #{vector_properties.keys.inspect})."
  end
  provider_name = vector_properties.dig(into, :provider)
  if provider_name.nil?
    raise InvalidEmbedDeclaration,
          "#{self}.embed: `into: :#{into}` has no `provider:` declared on its :vector " \
          "property. Add `provider: :openai` (or another registered name) to the " \
          "property declaration."
  end
  sources = source_fields.map(&:to_sym)
  missing = sources.reject { |f| fields.key?(f) }
  unless missing.empty?
    raise InvalidEmbedDeclaration,
          "#{self}.embed: source fields #{missing.inspect} are not declared on this class."
  end

  digest_field = (digest_field || :"#{into}_digest").to_sym
  unless fields.key?(digest_field)
    property digest_field, :string
  end

  directive = EmbedDirective.new(
    sources: sources,
    into: into,
    digest_field: digest_field,
    input_type: input_type,
    provider_name: provider_name,
  ).freeze
  embed_directives[into] = directive

  callback_method = :"_auto_embed_#{into}!"
  define_method(callback_method) do
    Parse::Core::EmbedManaged.recompute_embedding!(self, directive)
  end

  already_registered = _save_callbacks.any? do |cb|
    cb.kind == :before && (cb.filter.to_sym rescue cb.filter) == callback_method
  end
  before_save callback_method unless already_registered

  install_embed_writer_guard!(into, sources)

  into
end

#embed_directivesObject

Per-class registry of EmbedDirectives keyed by target vector property symbol. Read by tests and tooling; written only by #embed.



168
169
170
# File 'lib/parse/model/core/embed_managed.rb', line 168

def embed_directives
  @embed_directives ||= {}
end

#embed_image(source_field, into:, input_type: :search_document, digest_field: nil, allow_insecure: false) ⇒ Symbol

Declare a managed image embedding. Mirrors #embed but the source field is a :file property (Parse::File) and the provider call routes through Embeddings::Provider#embed_image rather than #embed_text. v5.1 ships URL-only: the SDK extracts the file's URL, validates it through Embeddings.validate_image_url! (sentinel-gated egress opt-in, CIDR / port / host allowlist), and forwards the canonicalized URL to the provider. The SDK does NOT download image bytes — bytes-fetch is the v5.3 path.

Digest is the URL string, not the file contents. Replacing the Parse::File with one pointing to a different URL re-embeds; re-saving the same URL is a no-op (zero provider calls). Cloud-stored Parse files have stable URLs unless overwritten, so this is the right cache key for most uploads. If you mutate the underlying bytes at the SAME URL (e.g. PUT-replace on S3 without renaming), the embedding will NOT refresh; rename the file or set :#{into}_digest to nil and resave to force re-embed.

Parameters:

  • source_field (Symbol)

    one :file property whose URL feeds the provider. (v5.1 accepts a single source per directive; multi-image-per-record support is deferred.)

  • into (Symbol)

    the :vector property to populate. Must already be declared with provider: metadata.

  • input_type (Symbol) (defaults to: :search_document)

    forwarded to Provider#embed_image. Defaults to :search_document.

  • digest_field (Symbol, nil) (defaults to: nil)

    override for the URL-digest sibling. Defaults to :"#{into}_digest". Auto-declared as :string if not already declared.

  • allow_insecure (Boolean) (defaults to: false)

    forwarded to Embeddings.validate_image_url!; permit http:// for local-dev CDN proxies. Default false.

Returns:

  • (Symbol)

    the target vector field name.

Raises:



277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
# File 'lib/parse/model/core/embed_managed.rb', line 277

def embed_image(source_field, into:, input_type: :search_document,
                digest_field: nil, allow_insecure: false)
  into = into.to_sym
  unless vector_properties.key?(into)
    raise InvalidEmbedDeclaration,
          "#{self}.embed_image: `into: :#{into}` is not a declared :vector property " \
          "(declared :vector fields: #{vector_properties.keys.inspect})."
  end
  provider_name = vector_properties.dig(into, :provider)
  if provider_name.nil?
    raise InvalidEmbedDeclaration,
          "#{self}.embed_image: `into: :#{into}` has no `provider:` declared on its " \
          ":vector property. Add `provider: :voyage` (or another registered name) " \
          "to the property declaration."
  end

  source = source_field.to_sym
  unless fields.key?(source)
    raise InvalidEmbedDeclaration,
          "#{self}.embed_image: source field #{source.inspect} is not declared on this class."
  end
  unless fields[source] == :file
    raise InvalidEmbedDeclaration,
          "#{self}.embed_image: source field #{source.inspect} must be a :file property " \
          "(got #{fields[source].inspect}). v5.1 image embedding accepts Parse::File " \
          "sources only — text sources go through `embed`."
  end

  digest_field = (digest_field || :"#{into}_digest").to_sym
  unless fields.key?(digest_field)
    property digest_field, :string
  end

  directive = EmbedDirective.new(
    sources: [source],
    into: into,
    digest_field: digest_field,
    input_type: input_type,
    provider_name: provider_name,
    modality: :image,
    allow_insecure: allow_insecure,
  ).freeze
  embed_directives[into] = directive

  callback_method = :"_auto_embed_#{into}!"
  define_method(callback_method) do
    Parse::Core::EmbedManaged.recompute_embedding!(self, directive)
  end

  already_registered = _save_callbacks.any? do |cb|
    cb.kind == :before && (cb.filter.to_sym rescue cb.filter) == callback_method
  end
  before_save callback_method unless already_registered

  install_embed_writer_guard!(into, [source])

  into
end

#embed_pending!(field: nil, batch_size: 100, limit: nil, where: nil, save_opts: {}) ⇒ Integer

Backfill embeddings for records whose managed vector field is still null — the bulk counterpart to the per-save embed path. Walks the class with objectId-cursor pagination (robust to the result set shrinking as records are embedded; terminates even when a record has no source text and stays null), saving each pending record so its before_save embed callback runs.

Intended as an admin / maintenance operation: it reads and writes through the default client, so run it with a master-key client (or pass save_opts: carrying a session_token: that can write every row).

Parameters:

  • field (Symbol, nil) (defaults to: nil)

    limit the backfill to one embed target; nil processes every declared directive.

  • batch_size (Integer) (defaults to: 100)

    rows fetched per round (default 100).

  • limit (Integer, nil) (defaults to: nil)

    stop after embedding at most this many records across all directives; nil = no cap.

  • where (Hash, nil) (defaults to: nil)

    extra query constraints AND-ed with the null-target filter (e.g. { published: true }).

  • save_opts (Hash) (defaults to: {})

    options forwarded to each record.save (e.g. session_token:).

Returns:

  • (Integer)

    number of records saved (embedded).

Raises:

  • (ArgumentError)

    when field: names no embed target, or the class declares no embed directives.



360
361
362
363
364
365
366
367
368
369
370
371
372
# File 'lib/parse/model/core/embed_managed.rb', line 360

def embed_pending!(field: nil, batch_size: 100, limit: nil, where: nil, save_opts: {})
  bs = Integer(batch_size)
  raise ArgumentError, "#{self}.embed_pending!: batch_size must be positive." if bs <= 0
  directives = resolve_embed_directives_for_backfill(field)

  processed = 0
  directives.each do |directive|
    remaining = limit ? (limit - processed) : nil
    break if remaining && remaining <= 0
    processed += backfill_embed_directive!(directive, bs, where, remaining, save_opts)
  end
  processed
end