Class: Parse::Embeddings::Cohere
- Defined in:
- lib/parse/embeddings/cohere.rb
Overview
Cohere embeddings provider. Wraps POST /v1/embed.
Supported models:
- v4 —
embed-v4.0(1536 native, Matryoshka 512, 1024, 1536, 128k-token context). Unified text + image model at the network boundary. The text path uses Cohere's/v1/embedendpoint; the image path (#embed_image, v5.1+) uses the/v2/embedmultimodal endpoint with OpenAI-style{ type: "image_url", image_url: { url: ... } }content rows. Text vectors stored today share the vector space with the eventual image vectors (no re-embed required when adding image-side data). - v3 —
embed-english-v3.0,embed-multilingual-v3.0(both 1024-dim),embed-english-light-v3.0,embed-multilingual-light-v3.0(both 384-dim). Text-only.
== Asymmetric input types
Cohere is one of the providers that DOES distinguish queries from
documents at the wire level via the input_type request field.
Sending input_type: "search_query" for a query and
"search_document" for a corpus item is required for good recall
on Cohere's v3 models — using the same type for both halves of a
retrieval pair degrades nDCG by a noticeable margin (Cohere's own
benchmarks). Provider#supports_input_type? returns true here
so callers / cache-keying middleware can branch on this.
The accepted Symbol values map to the Cohere wire strings:
:search_query→"search_query":search_document→"search_document":classification→"classification":clustering→"clustering"
== Security
- The Faraday connection refuses
proxy:unless the caller opts in viaallow_faraday_proxy: true. Env-proxy autodiscovery (HTTPS_PROXYetc.) is suppressed by default — same model asParse::Clientand OpenAI. #inspect(inherited from Provider) never surfaces@api_key.AuthorizationandCohere-Api-Keyare in Middleware::BodyBuilder::REDACTED_HEADERS.
Defined Under Namespace
Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError
Constant Summary collapse
- DEFAULT_BASE_URL =
"https://api.cohere.com/v1"- DEFAULT_MODEL =
"embed-english-v3.0"- DEFAULT_TIMEOUT =
30- DEFAULT_OPEN_TIMEOUT =
5- DEFAULT_MAX_RETRIES =
3- DEFAULT_BATCH_SIZE =
Cohere documents a hard cap of 96 inputs per
/embedcall. 96- MAX_RESPONSE_BYTES =
16 * 1024 * 1024
- MODEL_DEFAULT_DIMENSIONS =
{ "embed-v4.0" => 1536, "embed-english-v3.0" => 1024, "embed-multilingual-v3.0" => 1024, "embed-english-light-v3.0" => 384, "embed-multilingual-light-v3.0" => 384, }.freeze
- MODEL_MAX_INPUT_TOKENS =
{ "embed-v4.0" => 128_000, "embed-english-v3.0" => 512, "embed-multilingual-v3.0" => 512, "embed-english-light-v3.0" => 512, "embed-multilingual-light-v3.0" => 512, }.freeze
- MATRYOSHKA_MODELS =
Models that accept Cohere's
output_dimensionMatryoshka truncation parameter. v4.0 is the only such row today; v3 models reject the field with a 400. %w[embed-v4.0].freeze
- MULTIMODAL_MODELS =
Models that accept image inputs via the
/v2/embedmultimodal endpoint. Currently onlyembed-v4.0— v3 is text-only. %w[embed-v4.0].freeze
- MATRYOSHKA_WIDTHS =
Allowed Matryoshka widths per model (Cohere quantizes the available truncations rather than accepting any integer ≤ native). Empty allowlist = any integer ≤ native is fine, but for v4.0 Cohere documents exactly these four widths.
{ "embed-v4.0" => [256, 512, 1024, 1536].freeze, }.freeze
- INPUT_TYPE_WIRE_VALUES =
Map SDK-canonical input_type symbols to Cohere wire strings. Symbols outside this set raise — silently downgrading
:unknown_typeto"search_document"would mask cache-key bugs in higher layers (the value participates in cache keys). { search_query: "search_query", search_document: "search_document", classification: "classification", clustering: "clustering", }.freeze
Constants inherited from Provider
Provider::AS_NOTIFICATION_NAME
Instance Method Summary collapse
- #backoff_seconds(attempt) ⇒ Object protected
- #build_connection ⇒ Object protected
- #dimensions ⇒ Object
- #embed_batch_size ⇒ Object
-
#embed_image(sources, input_type: :search_document, allow_insecure: false) ⇒ Array<Array<Float>>
Embed a batch of image URLs through Cohere's
/v2/embedmultimodal endpoint. -
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Vectors aligned 1:1 with
strings. -
#extract_vectors!(payload, input_count) ⇒ Object
protected
Cohere's v1 /embed response shape:.
-
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ Cohere
constructor
A new instance of Cohere.
- #inspect_attrs ⇒ Object
- #max_input_tokens ⇒ Object
-
#modalities ⇒ Array<Symbol>
[:text, :image]forembed-v4.0,[:text]for v3 models. - #model_name ⇒ Object
- #normalize? ⇒ Boolean
- #parse_json_body!(body) ⇒ Object protected
-
#post_embeddings(body, path: "embed") ⇒ Object
protected
path:accepts either a Faraday-relative segment (default"embed", which resolves under the configured/v1/base) or an absolute path ("/v2/embed") for endpoints outside the configured base — used by #embed_image to reach/v2/embed. - #retry_after_seconds(response) ⇒ Object protected
- #supports_input_type? ⇒ Boolean
-
#v2_embed_path ⇒ Object
protected
private
Compute the v2/embed path relative to the configured base_url's path component.
Methods inherited from Provider
#embed_text_batched, #inspect, #instrument_embed, #validate_response!
Constructor Details
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ Cohere
Returns a new instance of Cohere.
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
# File 'lib/parse/embeddings/cohere.rb', line 138 def initialize( api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil ) validate_api_key!(api_key) validate_model!(model) validate_dimensions!(model, dimensions) sanitized_base_url = validate_base_url!(base_url, allow_insecure_base_url) validate_positive_integer!(:timeout, timeout) validate_positive_integer!(:open_timeout, open_timeout) validate_non_negative_integer!(:max_retries, max_retries) validate_positive_integer!(:embed_batch_size, ) if > 96 raise ArgumentError, "Parse::Embeddings::Cohere: embed_batch_size #{} exceeds Cohere's per-request cap (96)." end @api_key = api_key @model = model @dimensions = dimensions || MODEL_DEFAULT_DIMENSIONS.fetch(model) @base_url = sanitized_base_url @timeout = timeout @open_timeout = open_timeout @max_retries = max_retries @embed_batch_size = @allow_faraday_proxy = allow_faraday_proxy @connection = connection || build_connection end |
Instance Method Details
#backoff_seconds(attempt) ⇒ Object (protected)
510 511 512 |
# File 'lib/parse/embeddings/cohere.rb', line 510 def backoff_seconds(attempt) [0.5 * (2**(attempt - 1)), 30.0].min end |
#build_connection ⇒ Object (protected)
362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 |
# File 'lib/parse/embeddings/cohere.rb', line 362 def build_connection headers = { "Authorization" => "Bearer #{@api_key}", "Content-Type" => "application/json", "Accept" => "application/json", "User-Agent" => "parse-stack-embeddings/#{user_agent_version}", } faraday_opts = { url: @base_url, headers: headers } faraday_opts[:proxy] = nil unless @allow_faraday_proxy conn = Faraday.new(**faraday_opts) do |f| f..timeout = @timeout f..open_timeout = @open_timeout f.adapter Faraday.default_adapter end conn.proxy = nil if !@allow_faraday_proxy && conn.respond_to?(:proxy=) conn end |
#dimensions ⇒ Object
176 177 178 |
# File 'lib/parse/embeddings/cohere.rb', line 176 def dimensions @dimensions end |
#embed_batch_size ⇒ Object
184 185 186 |
# File 'lib/parse/embeddings/cohere.rb', line 184 def @embed_batch_size end |
#embed_image(sources, input_type: :search_document, allow_insecure: false) ⇒ Array<Array<Float>>
Embed a batch of image URLs through Cohere's /v2/embed
multimodal endpoint. v5.1 ships URL-only — the provider
receives a public URL and issues its own fetch. The SDK does
NOT download the image; it validates the URL through
Parse::Embeddings.validate_image_url! (sentinel-gated egress
opt-in, CIDR / port / host allowlist) and forwards the
canonicalized URL string in the { type: "image_url",
image_url: { url: ... } } content row.
Multimodal model required. Cohere's v3 models do not accept
image inputs; calling embed_image on a v3-configured provider
raises BadRequestError before any network call.
Wire shape differs from Voyage#embed_image. Voyage uses
{ type: "image_url", image_url: "<url>" } (flat String); Cohere
v2 uses { type: "image_url", image_url: { url: "<url>" } }
(nested object), matching the OpenAI chat-completions content
convention. The high-level SDK contract is identical — callers
pass an Array<String> of URLs.
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 |
# File 'lib/parse/embeddings/cohere.rb', line 292 def (sources, input_type: :search_document, allow_insecure: false) unless MULTIMODAL_MODELS.include?(@model) raise BadRequestError, "Parse::Embeddings::Cohere#embed_image: model #{@model.inspect} does not " \ "accept image inputs. Configure the provider with a multimodal model " \ "(supported: #{MULTIMODAL_MODELS.inspect})." end unless sources.is_a?(Array) raise ArgumentError, "Parse::Embeddings::Cohere#embed_image expects Array of image URLs " \ "(got #{sources.class})." end return [] if sources.empty? wire_input_type = INPUT_TYPE_WIRE_VALUES[input_type] unless wire_input_type raise ArgumentError, "Parse::Embeddings::Cohere#embed_image input_type #{input_type.inspect} not in " \ "#{INPUT_TYPE_WIRE_VALUES.keys.inspect}." end # Cohere caps `/v2/embed` at the same 96-input per-call limit # as `/v1/embed`. Guard direct-API callers against a silent # 400 — the DSL passes a single URL per directive. if sources.length > @embed_batch_size raise ArgumentError, "Parse::Embeddings::Cohere#embed_image: batch size #{sources.length} exceeds " \ "the configured cap #{@embed_batch_size} (Cohere per-request max: 96). " \ "Split the input and call embed_image once per chunk." end # Validate every URL up-front so a malformed entry in slot N # does not slip through after slots 0..N-1 are already in the # wire body. Forward the canonicalized URL the validator # returned — not the caller's raw input. canonical_urls = sources.each_with_index.map do |url, i| unless url.is_a?(String) raise ArgumentError, "Parse::Embeddings::Cohere#embed_image sources[#{i}] is not a String " \ "(#{url.class}). v5.1 ships URL-only — bytes/IO support is v5.3." end Parse::Embeddings.validate_image_url!(url, allow_insecure: allow_insecure) end body = { model: @model, input_type: wire_input_type, embedding_types: ["float"], inputs: canonical_urls.map { |u| { content: [{ type: "image_url", image_url: { url: u } }] } }, } (sources.length, input_type, modality: :image) do |emit_payload| payload = (body, path: ) if payload.is_a?(Hash) && payload["meta"].is_a?(Hash) && payload["meta"]["billed_units"].is_a?(Hash) tt = payload["meta"]["billed_units"]["input_tokens"] emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0 end vectors = extract_vectors!(payload, sources.length) validate_response!(sources.length, vectors) end end |
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Returns vectors aligned 1:1 with strings.
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
# File 'lib/parse/embeddings/cohere.rb', line 204 def (strings, input_type: :search_document) unless strings.is_a?(Array) raise ArgumentError, "Parse::Embeddings::Cohere#embed_text expects Array<String> (got #{strings.class})." end return [] if strings.empty? strings.each_with_index do |s, i| unless s.is_a?(String) raise ArgumentError, "Parse::Embeddings::Cohere#embed_text strings[#{i}] is not a String (#{s.class})." end if s.empty? raise ArgumentError, "Parse::Embeddings::Cohere#embed_text strings[#{i}] is empty; Cohere rejects empty inputs." end end wire_input_type = INPUT_TYPE_WIRE_VALUES[input_type] unless wire_input_type raise ArgumentError, "Parse::Embeddings::Cohere#embed_text input_type #{input_type.inspect} not in " \ "#{INPUT_TYPE_WIRE_VALUES.keys.inspect}." end body = { texts: strings, model: @model, input_type: wire_input_type, embedding_types: ["float"], } # Forward `output_dimension` only for Matryoshka-capable models # whose active width differs from native. Sending it to a v3 # row would yield a 400 from Cohere. if MATRYOSHKA_MODELS.include?(@model) && @dimensions != MODEL_DEFAULT_DIMENSIONS.fetch(@model) body[:output_dimension] = @dimensions end (strings.length, input_type) do |emit_payload| payload = (body) # Cohere's response carries `meta.billed_units.input_tokens` # (and `output_tokens`, though for embeddings it's 0). Forward # input_tokens as the operator-facing cost number on the AS::N # payload so cost subscribers can budget across providers. if payload.is_a?(Hash) && payload["meta"].is_a?(Hash) && payload["meta"]["billed_units"].is_a?(Hash) tt = payload["meta"]["billed_units"]["input_tokens"] emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0 end vectors = extract_vectors!(payload, strings.length) validate_response!(strings.length, vectors) end end |
#extract_vectors!(payload, input_count) ⇒ Object (protected)
Cohere's v1 /embed response shape:
{ "id": "...", "embeddings": { "float": [[...], [...]] }, # when embedding_types=["float"] "texts": [...], "meta": { "billed_units": { "input_tokens": N } } }
A legacy/no-embedding_types call returns embeddings: [[...]]
as a bare Array. We accept both shapes — the request always
sends embedding_types: ["float"], but proxies / Cohere's
versioned endpoints may strip it.
482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 |
# File 'lib/parse/embeddings/cohere.rb', line 482 def extract_vectors!(payload, input_count) unless payload.is_a?(Hash) raise InvalidResponseError, "Parse::Embeddings::Cohere: response body is not a JSON object." end = payload["embeddings"] vectors = case when Hash f = ["float"] unless f.is_a?(Array) raise InvalidResponseError, "Parse::Embeddings::Cohere: response.embeddings.float is not an Array." end f when Array else raise InvalidResponseError, "Parse::Embeddings::Cohere: response.embeddings is neither Hash nor Array." end if vectors.length != input_count raise InvalidResponseError, "Parse::Embeddings::Cohere: response embeddings count #{vectors.length} != input count #{input_count}." end vectors end |
#inspect_attrs ⇒ Object
356 357 358 |
# File 'lib/parse/embeddings/cohere.rb', line 356 def inspect_attrs super.merge(base: safe_base_host, retries: @max_retries) end |
#max_input_tokens ⇒ Object
188 189 190 |
# File 'lib/parse/embeddings/cohere.rb', line 188 def max_input_tokens MODEL_MAX_INPUT_TOKENS[@model] end |
#modalities ⇒ Array<Symbol>
Returns [:text, :image] for embed-v4.0,
[:text] for v3 models.
259 260 261 |
# File 'lib/parse/embeddings/cohere.rb', line 259 def modalities MULTIMODAL_MODELS.include?(@model) ? %i[text image] : [:text] end |
#model_name ⇒ Object
180 181 182 |
# File 'lib/parse/embeddings/cohere.rb', line 180 def model_name @model end |
#normalize? ⇒ Boolean
192 193 194 195 |
# File 'lib/parse/embeddings/cohere.rb', line 192 def normalize? # Cohere v3 embeddings are documented unit-normalized. true end |
#parse_json_body!(body) ⇒ Object (protected)
456 457 458 459 460 461 462 463 464 465 466 467 |
# File 'lib/parse/embeddings/cohere.rb', line 456 def parse_json_body!(body) s = body.to_s if s.bytesize > MAX_RESPONSE_BYTES raise InvalidResponseError, "Parse::Embeddings::Cohere: response body exceeds #{MAX_RESPONSE_BYTES} bytes " \ "(#{s.bytesize}). Refusing to parse." end JSON.parse(s, max_nesting: 32) rescue JSON::ParserError => e raise InvalidResponseError, "Parse::Embeddings::Cohere: response is not valid JSON (#{e.})." end |
#post_embeddings(body, path: "embed") ⇒ Object (protected)
path: accepts either a Faraday-relative segment (default
"embed", which resolves under the configured /v1/ base) or
an absolute path ("/v2/embed") for endpoints outside the
configured base — used by #embed_image to reach /v2/embed.
412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 |
# File 'lib/parse/embeddings/cohere.rb', line 412 def (body, path: "embed") attempts = 0 loop do attempts += 1 begin response = @connection.post(path) do |req| req.body = body.to_json end rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e if attempts > @max_retries raise TransientError, "Parse::Embeddings::Cohere: #{e.class} after #{attempts} attempt(s)." end sleep(backoff_seconds(attempts)) next end status = response.status return parse_json_body!(response.body) if status >= 200 && status < 300 if status == 401 raise AuthenticationError, "Parse::Embeddings::Cohere: 401 Unauthorized — check api_key." end if status == 429 if attempts > @max_retries raise RateLimitError, "Parse::Embeddings::Cohere: 429 rate limited after #{attempts} attempt(s)." end sleep(retry_after_seconds(response) || backoff_seconds(attempts)) next end if status >= 500 if attempts > @max_retries raise TransientError, "Parse::Embeddings::Cohere: #{status} after #{attempts} attempt(s)." end sleep(backoff_seconds(attempts)) next end raise BadRequestError, "Parse::Embeddings::Cohere: #{status} from POST #{path.start_with?('/') ? path : "/#{path}"}." end end |
#retry_after_seconds(response) ⇒ Object (protected)
514 515 516 517 518 519 |
# File 'lib/parse/embeddings/cohere.rb', line 514 def retry_after_seconds(response) ra = response.respond_to?(:headers) ? response.headers["retry-after"] || response.headers["Retry-After"] : nil return nil unless ra v = ra.to_f v.positive? ? [v, 60.0].min : nil end |
#supports_input_type? ⇒ Boolean
197 198 199 |
# File 'lib/parse/embeddings/cohere.rb', line 197 def supports_input_type? true end |
#v2_embed_path ⇒ Object (protected)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Compute the v2/embed path relative to the configured base_url's
path component. For the default base https://api.cohere.com/v1
this produces /v2/embed; for a custom-proxy base like
https://corp-proxy.example.com/cohere/v1 it produces
/cohere/v2/embed — so the operator's proxy / egress-logging
/ API-key custody layer is NOT silently bypassed by image
embedding calls. The substitution targets the trailing /v1
segment specifically; bases without that segment fall back to
appending /v2/embed to the host root with a warning so the
caller sees the asymmetry rather than discovering it via a
404 from a misrouted request.
394 395 396 397 398 399 400 401 402 403 404 405 406 |
# File 'lib/parse/embeddings/cohere.rb', line 394 def uri = URI.parse(@base_url) path = uri.path.to_s if path =~ %r{/v1/?\z}i # Replace `/v1` (with optional trailing slash) with `/v2/embed`. path.sub(%r{/v1/?\z}i, "/v2/embed") else warn "[Parse::Embeddings::Cohere] base_url path #{path.inspect} does not end " \ "in `/v1` — embed_image will POST to host-root `/v2/embed`, which may " \ "bypass a configured proxy path. Configure base_url to end with `/v1`." "/v2/embed" end end |