Class: Parse::Embeddings::Cohere

Inherits:
Provider
  • Object
show all
Defined in:
lib/parse/embeddings/cohere.rb

Overview

Cohere embeddings provider. Wraps POST /v1/embed.

Supported models:

  • v4embed-v4.0 (1536 native, Matryoshka 512, 1024, 1536, 128k-token context). Unified text + image model at the network boundary. The text path uses Cohere's /v1/embed endpoint; the image path (#embed_image, v5.1+) uses the /v2/embed multimodal endpoint with OpenAI-style { type: "image_url", image_url: { url: ... } } content rows. Text vectors stored today share the vector space with the eventual image vectors (no re-embed required when adding image-side data).
  • v3embed-english-v3.0, embed-multilingual-v3.0 (both 1024-dim), embed-english-light-v3.0, embed-multilingual-light-v3.0 (both 384-dim). Text-only.

== Asymmetric input types

Cohere is one of the providers that DOES distinguish queries from documents at the wire level via the input_type request field. Sending input_type: "search_query" for a query and "search_document" for a corpus item is required for good recall on Cohere's v3 models — using the same type for both halves of a retrieval pair degrades nDCG by a noticeable margin (Cohere's own benchmarks). Provider#supports_input_type? returns true here so callers / cache-keying middleware can branch on this.

The accepted Symbol values map to the Cohere wire strings:

  • :search_query"search_query"
  • :search_document"search_document"
  • :classification"classification"
  • :clustering"clustering"

== Security

  • The Faraday connection refuses proxy: unless the caller opts in via allow_faraday_proxy: true. Env-proxy autodiscovery (HTTPS_PROXY etc.) is suppressed by default — same model as Parse::Client and OpenAI.
  • #inspect (inherited from Provider) never surfaces @api_key.
  • Authorization and Cohere-Api-Key are in Middleware::BodyBuilder::REDACTED_HEADERS.

Examples:

registration

Parse::Embeddings.register(:cohere,
  Parse::Embeddings::Cohere.new(
    api_key: ENV.fetch("COHERE_API_KEY"),
    model:   "embed-english-v3.0",
  ))

Defined Under Namespace

Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError

Constant Summary collapse

DEFAULT_BASE_URL =
"https://api.cohere.com/v1"
DEFAULT_MODEL =
"embed-english-v3.0"
DEFAULT_TIMEOUT =
30
DEFAULT_OPEN_TIMEOUT =
5
DEFAULT_MAX_RETRIES =
3
DEFAULT_BATCH_SIZE =

Cohere documents a hard cap of 96 inputs per /embed call.

96
MAX_RESPONSE_BYTES =
16 * 1024 * 1024
MODEL_DEFAULT_DIMENSIONS =
{
  "embed-v4.0"                     => 1536,
  "embed-english-v3.0"             => 1024,
  "embed-multilingual-v3.0"        => 1024,
  "embed-english-light-v3.0"       => 384,
  "embed-multilingual-light-v3.0"  => 384,
}.freeze
MODEL_MAX_INPUT_TOKENS =
{
  "embed-v4.0"                     => 128_000,
  "embed-english-v3.0"             => 512,
  "embed-multilingual-v3.0"        => 512,
  "embed-english-light-v3.0"       => 512,
  "embed-multilingual-light-v3.0"  => 512,
}.freeze
MATRYOSHKA_MODELS =

Models that accept Cohere's output_dimension Matryoshka truncation parameter. v4.0 is the only such row today; v3 models reject the field with a 400.

%w[embed-v4.0].freeze
MULTIMODAL_MODELS =

Models that accept image inputs via the /v2/embed multimodal endpoint. Currently only embed-v4.0 — v3 is text-only.

%w[embed-v4.0].freeze
MATRYOSHKA_WIDTHS =

Allowed Matryoshka widths per model (Cohere quantizes the available truncations rather than accepting any integer ≤ native). Empty allowlist = any integer ≤ native is fine, but for v4.0 Cohere documents exactly these four widths.

{
  "embed-v4.0" => [256, 512, 1024, 1536].freeze,
}.freeze
INPUT_TYPE_WIRE_VALUES =

Map SDK-canonical input_type symbols to Cohere wire strings. Symbols outside this set raise — silently downgrading :unknown_type to "search_document" would mask cache-key bugs in higher layers (the value participates in cache keys).

{
  search_query:    "search_query",
  search_document: "search_document",
  classification:  "classification",
  clustering:      "clustering",
}.freeze

Constants inherited from Provider

Provider::AS_NOTIFICATION_NAME

Instance Method Summary collapse

Methods inherited from Provider

#embed_text_batched, #inspect, #instrument_embed, #validate_response!

Constructor Details

#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ Cohere

Returns a new instance of Cohere.

Parameters:

  • api_key (String)

    required. Sent as Authorization: Bearer ….

  • model (String) (defaults to: DEFAULT_MODEL)

    one of MODEL_DEFAULT_DIMENSIONS's keys.

  • base_url (String) (defaults to: DEFAULT_BASE_URL)

    override. Must be HTTPS unless allow_insecure_base_url: true.

  • timeout (Integer) (defaults to: DEFAULT_TIMEOUT)

    read timeout, seconds.

  • open_timeout (Integer) (defaults to: DEFAULT_OPEN_TIMEOUT)

    connect timeout, seconds.

  • max_retries (Integer) (defaults to: DEFAULT_MAX_RETRIES)

    retry attempts on 429/5xx/timeouts.

  • embed_batch_size (Integer) (defaults to: DEFAULT_BATCH_SIZE)

    inputs per request (max 96).

  • allow_faraday_proxy (Boolean) (defaults to: false)

    opt in to proxy / env-proxy autodiscovery. Defaults false.

  • allow_insecure_base_url (Boolean) (defaults to: false)

    permit http:// base (local proxies). Defaults false.

  • connection (Faraday::Connection, nil) (defaults to: nil)

    injection seam for tests.



138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# File 'lib/parse/embeddings/cohere.rb', line 138

def initialize(
  api_key:,
  model: DEFAULT_MODEL,
  dimensions: nil,
  base_url: DEFAULT_BASE_URL,
  timeout: DEFAULT_TIMEOUT,
  open_timeout: DEFAULT_OPEN_TIMEOUT,
  max_retries: DEFAULT_MAX_RETRIES,
  embed_batch_size: DEFAULT_BATCH_SIZE,
  allow_faraday_proxy: false,
  allow_insecure_base_url: false,
  connection: nil
)
  validate_api_key!(api_key)
  validate_model!(model)
  validate_dimensions!(model, dimensions)
  sanitized_base_url = validate_base_url!(base_url, allow_insecure_base_url)
  validate_positive_integer!(:timeout, timeout)
  validate_positive_integer!(:open_timeout, open_timeout)
  validate_non_negative_integer!(:max_retries, max_retries)
  validate_positive_integer!(:embed_batch_size, embed_batch_size)
  if embed_batch_size > 96
    raise ArgumentError,
          "Parse::Embeddings::Cohere: embed_batch_size #{embed_batch_size} exceeds Cohere's per-request cap (96)."
  end

  @api_key = api_key
  @model = model
  @dimensions = dimensions || MODEL_DEFAULT_DIMENSIONS.fetch(model)
  @base_url = sanitized_base_url
  @timeout = timeout
  @open_timeout = open_timeout
  @max_retries = max_retries
  @embed_batch_size = embed_batch_size
  @allow_faraday_proxy = allow_faraday_proxy
  @connection = connection || build_connection
end

Instance Method Details

#backoff_seconds(attempt) ⇒ Object (protected)



510
511
512
# File 'lib/parse/embeddings/cohere.rb', line 510

def backoff_seconds(attempt)
  [0.5 * (2**(attempt - 1)), 30.0].min
end

#build_connectionObject (protected)



362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
# File 'lib/parse/embeddings/cohere.rb', line 362

def build_connection
  headers = {
    "Authorization" => "Bearer #{@api_key}",
    "Content-Type" => "application/json",
    "Accept" => "application/json",
    "User-Agent" => "parse-stack-embeddings/#{user_agent_version}",
  }

  faraday_opts = { url: @base_url, headers: headers }
  faraday_opts[:proxy] = nil unless @allow_faraday_proxy

  conn = Faraday.new(**faraday_opts) do |f|
    f.options.timeout = @timeout
    f.options.open_timeout = @open_timeout
    f.adapter Faraday.default_adapter
  end
  conn.proxy = nil if !@allow_faraday_proxy && conn.respond_to?(:proxy=)
  conn
end

#dimensionsObject



176
177
178
# File 'lib/parse/embeddings/cohere.rb', line 176

def dimensions
  @dimensions
end

#embed_batch_sizeObject



184
185
186
# File 'lib/parse/embeddings/cohere.rb', line 184

def embed_batch_size
  @embed_batch_size
end

#embed_image(sources, input_type: :search_document, allow_insecure: false) ⇒ Array<Array<Float>>

Embed a batch of image URLs through Cohere's /v2/embed multimodal endpoint. v5.1 ships URL-only — the provider receives a public URL and issues its own fetch. The SDK does NOT download the image; it validates the URL through Parse::Embeddings.validate_image_url! (sentinel-gated egress opt-in, CIDR / port / host allowlist) and forwards the canonicalized URL string in the { type: "image_url", image_url: { url: ... } } content row.

Multimodal model required. Cohere's v3 models do not accept image inputs; calling embed_image on a v3-configured provider raises BadRequestError before any network call.

Wire shape differs from Voyage#embed_image. Voyage uses { type: "image_url", image_url: "<url>" } (flat String); Cohere v2 uses { type: "image_url", image_url: { url: "<url>" } } (nested object), matching the OpenAI chat-completions content convention. The high-level SDK contract is identical — callers pass an Array<String> of URLs.

Parameters:

  • sources (Array<String>)

    image URLs. Each must satisfy Parse::Embeddings.validate_image_url!; failing entries abort the whole batch (no partial forwarding).

  • input_type (Symbol) (defaults to: :search_document)

    one of INPUT_TYPE_WIRE_VALUES's keys; mapped to Cohere's input_type field. Defaults to :search_document.

  • allow_insecure (Boolean) (defaults to: false)

    forwarded to the URL validator; permit http:// for local-dev CDN proxies.

Returns:

  • (Array<Array<Float>>)

    vectors aligned 1:1 with sources.



292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
# File 'lib/parse/embeddings/cohere.rb', line 292

def embed_image(sources, input_type: :search_document, allow_insecure: false)
  unless MULTIMODAL_MODELS.include?(@model)
    raise BadRequestError,
          "Parse::Embeddings::Cohere#embed_image: model #{@model.inspect} does not " \
          "accept image inputs. Configure the provider with a multimodal model " \
          "(supported: #{MULTIMODAL_MODELS.inspect})."
  end
  unless sources.is_a?(Array)
    raise ArgumentError,
          "Parse::Embeddings::Cohere#embed_image expects Array of image URLs " \
          "(got #{sources.class})."
  end
  return [] if sources.empty?

  wire_input_type = INPUT_TYPE_WIRE_VALUES[input_type]
  unless wire_input_type
    raise ArgumentError,
          "Parse::Embeddings::Cohere#embed_image input_type #{input_type.inspect} not in " \
          "#{INPUT_TYPE_WIRE_VALUES.keys.inspect}."
  end
  # Cohere caps `/v2/embed` at the same 96-input per-call limit
  # as `/v1/embed`. Guard direct-API callers against a silent
  # 400 — the DSL passes a single URL per directive.
  if sources.length > @embed_batch_size
    raise ArgumentError,
          "Parse::Embeddings::Cohere#embed_image: batch size #{sources.length} exceeds " \
          "the configured cap #{@embed_batch_size} (Cohere per-request max: 96). " \
          "Split the input and call embed_image once per chunk."
  end

  # Validate every URL up-front so a malformed entry in slot N
  # does not slip through after slots 0..N-1 are already in the
  # wire body. Forward the canonicalized URL the validator
  # returned — not the caller's raw input.
  canonical_urls = sources.each_with_index.map do |url, i|
    unless url.is_a?(String)
      raise ArgumentError,
            "Parse::Embeddings::Cohere#embed_image sources[#{i}] is not a String " \
            "(#{url.class}). v5.1 ships URL-only — bytes/IO support is v5.3."
    end
    Parse::Embeddings.validate_image_url!(url, allow_insecure: allow_insecure)
  end

  body = {
    model: @model,
    input_type: wire_input_type,
    embedding_types: ["float"],
    inputs: canonical_urls.map { |u|
      { content: [{ type: "image_url", image_url: { url: u } }] }
    },
  }

  instrument_embed(sources.length, input_type, modality: :image) do |emit_payload|
    payload = post_embeddings(body, path: v2_embed_path)
    if payload.is_a?(Hash) && payload["meta"].is_a?(Hash) &&
       payload["meta"]["billed_units"].is_a?(Hash)
      tt = payload["meta"]["billed_units"]["input_tokens"]
      emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0
    end
    vectors = extract_vectors!(payload, sources.length)
    validate_response!(sources.length, vectors)
  end
end

#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>

Returns vectors aligned 1:1 with strings.

Parameters:

Returns:

  • (Array<Array<Float>>)

    vectors aligned 1:1 with strings.



204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# File 'lib/parse/embeddings/cohere.rb', line 204

def embed_text(strings, input_type: :search_document)
  unless strings.is_a?(Array)
    raise ArgumentError,
          "Parse::Embeddings::Cohere#embed_text expects Array<String> (got #{strings.class})."
  end
  return [] if strings.empty?
  strings.each_with_index do |s, i|
    unless s.is_a?(String)
      raise ArgumentError,
            "Parse::Embeddings::Cohere#embed_text strings[#{i}] is not a String (#{s.class})."
    end
    if s.empty?
      raise ArgumentError,
            "Parse::Embeddings::Cohere#embed_text strings[#{i}] is empty; Cohere rejects empty inputs."
    end
  end
  wire_input_type = INPUT_TYPE_WIRE_VALUES[input_type]
  unless wire_input_type
    raise ArgumentError,
          "Parse::Embeddings::Cohere#embed_text input_type #{input_type.inspect} not in " \
          "#{INPUT_TYPE_WIRE_VALUES.keys.inspect}."
  end

  body = {
    texts: strings,
    model: @model,
    input_type: wire_input_type,
    embedding_types: ["float"],
  }
  # Forward `output_dimension` only for Matryoshka-capable models
  # whose active width differs from native. Sending it to a v3
  # row would yield a 400 from Cohere.
  if MATRYOSHKA_MODELS.include?(@model) &&
     @dimensions != MODEL_DEFAULT_DIMENSIONS.fetch(@model)
    body[:output_dimension] = @dimensions
  end

  instrument_embed(strings.length, input_type) do |emit_payload|
    payload = post_embeddings(body)
    # Cohere's response carries `meta.billed_units.input_tokens`
    # (and `output_tokens`, though for embeddings it's 0). Forward
    # input_tokens as the operator-facing cost number on the AS::N
    # payload so cost subscribers can budget across providers.
    if payload.is_a?(Hash) && payload["meta"].is_a?(Hash) &&
       payload["meta"]["billed_units"].is_a?(Hash)
      tt = payload["meta"]["billed_units"]["input_tokens"]
      emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0
    end
    vectors = extract_vectors!(payload, strings.length)
    validate_response!(strings.length, vectors)
  end
end

#extract_vectors!(payload, input_count) ⇒ Object (protected)

Cohere's v1 /embed response shape:

{ "id": "...", "embeddings": { "float": [[...], [...]] }, # when embedding_types=["float"] "texts": [...], "meta": { "billed_units": { "input_tokens": N } } }

A legacy/no-embedding_types call returns embeddings: [[...]] as a bare Array. We accept both shapes — the request always sends embedding_types: ["float"], but proxies / Cohere's versioned endpoints may strip it.



482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
# File 'lib/parse/embeddings/cohere.rb', line 482

def extract_vectors!(payload, input_count)
  unless payload.is_a?(Hash)
    raise InvalidResponseError,
          "Parse::Embeddings::Cohere: response body is not a JSON object."
  end
  embeddings = payload["embeddings"]
  vectors =
    case embeddings
    when Hash
      f = embeddings["float"]
      unless f.is_a?(Array)
        raise InvalidResponseError,
              "Parse::Embeddings::Cohere: response.embeddings.float is not an Array."
      end
      f
    when Array
      embeddings
    else
      raise InvalidResponseError,
            "Parse::Embeddings::Cohere: response.embeddings is neither Hash nor Array."
    end
  if vectors.length != input_count
    raise InvalidResponseError,
          "Parse::Embeddings::Cohere: response embeddings count #{vectors.length} != input count #{input_count}."
  end
  vectors
end

#inspect_attrsObject



356
357
358
# File 'lib/parse/embeddings/cohere.rb', line 356

def inspect_attrs
  super.merge(base: safe_base_host, retries: @max_retries)
end

#max_input_tokensObject



188
189
190
# File 'lib/parse/embeddings/cohere.rb', line 188

def max_input_tokens
  MODEL_MAX_INPUT_TOKENS[@model]
end

#modalitiesArray<Symbol>

Returns [:text, :image] for embed-v4.0, [:text] for v3 models.

Returns:

  • (Array<Symbol>)

    [:text, :image] for embed-v4.0, [:text] for v3 models.



259
260
261
# File 'lib/parse/embeddings/cohere.rb', line 259

def modalities
  MULTIMODAL_MODELS.include?(@model) ? %i[text image] : [:text]
end

#model_nameObject



180
181
182
# File 'lib/parse/embeddings/cohere.rb', line 180

def model_name
  @model
end

#normalize?Boolean

Returns:

  • (Boolean)


192
193
194
195
# File 'lib/parse/embeddings/cohere.rb', line 192

def normalize?
  # Cohere v3 embeddings are documented unit-normalized.
  true
end

#parse_json_body!(body) ⇒ Object (protected)



456
457
458
459
460
461
462
463
464
465
466
467
# File 'lib/parse/embeddings/cohere.rb', line 456

def parse_json_body!(body)
  s = body.to_s
  if s.bytesize > MAX_RESPONSE_BYTES
    raise InvalidResponseError,
          "Parse::Embeddings::Cohere: response body exceeds #{MAX_RESPONSE_BYTES} bytes " \
          "(#{s.bytesize}). Refusing to parse."
  end
  JSON.parse(s, max_nesting: 32)
rescue JSON::ParserError => e
  raise InvalidResponseError,
        "Parse::Embeddings::Cohere: response is not valid JSON (#{e.message})."
end

#post_embeddings(body, path: "embed") ⇒ Object (protected)

path: accepts either a Faraday-relative segment (default "embed", which resolves under the configured /v1/ base) or an absolute path ("/v2/embed") for endpoints outside the configured base — used by #embed_image to reach /v2/embed.



412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
# File 'lib/parse/embeddings/cohere.rb', line 412

def post_embeddings(body, path: "embed")
  attempts = 0
  loop do
    attempts += 1
    begin
      response = @connection.post(path) do |req|
        req.body = body.to_json
      end
    rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e
      if attempts > @max_retries
        raise TransientError, "Parse::Embeddings::Cohere: #{e.class} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end

    status = response.status
    return parse_json_body!(response.body) if status >= 200 && status < 300

    if status == 401
      raise AuthenticationError,
            "Parse::Embeddings::Cohere: 401 Unauthorized — check api_key."
    end
    if status == 429
      if attempts > @max_retries
        raise RateLimitError,
              "Parse::Embeddings::Cohere: 429 rate limited after #{attempts} attempt(s)."
      end
      sleep(retry_after_seconds(response) || backoff_seconds(attempts))
      next
    end
    if status >= 500
      if attempts > @max_retries
        raise TransientError,
              "Parse::Embeddings::Cohere: #{status} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end
    raise BadRequestError,
          "Parse::Embeddings::Cohere: #{status} from POST #{path.start_with?('/') ? path : "/#{path}"}."
  end
end

#retry_after_seconds(response) ⇒ Object (protected)



514
515
516
517
518
519
# File 'lib/parse/embeddings/cohere.rb', line 514

def retry_after_seconds(response)
  ra = response.respond_to?(:headers) ? response.headers["retry-after"] || response.headers["Retry-After"] : nil
  return nil unless ra
  v = ra.to_f
  v.positive? ? [v, 60.0].min : nil
end

#supports_input_type?Boolean

Returns:

  • (Boolean)


197
198
199
# File 'lib/parse/embeddings/cohere.rb', line 197

def supports_input_type?
  true
end

#v2_embed_pathObject (protected)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Compute the v2/embed path relative to the configured base_url's path component. For the default base https://api.cohere.com/v1 this produces /v2/embed; for a custom-proxy base like https://corp-proxy.example.com/cohere/v1 it produces /cohere/v2/embed — so the operator's proxy / egress-logging / API-key custody layer is NOT silently bypassed by image embedding calls. The substitution targets the trailing /v1 segment specifically; bases without that segment fall back to appending /v2/embed to the host root with a warning so the caller sees the asymmetry rather than discovering it via a 404 from a misrouted request.



394
395
396
397
398
399
400
401
402
403
404
405
406
# File 'lib/parse/embeddings/cohere.rb', line 394

def v2_embed_path
  uri = URI.parse(@base_url)
  path = uri.path.to_s
  if path =~ %r{/v1/?\z}i
    # Replace `/v1` (with optional trailing slash) with `/v2/embed`.
    path.sub(%r{/v1/?\z}i, "/v2/embed")
  else
    warn "[Parse::Embeddings::Cohere] base_url path #{path.inspect} does not end " \
         "in `/v1` — embed_image will POST to host-root `/v2/embed`, which may " \
         "bypass a configured proxy path. Configure base_url to end with `/v1`."
    "/v2/embed"
  end
end