Class: Parse::Embeddings::OpenAI

Inherits:

Provider

Object
Provider
Parse::Embeddings::OpenAI

show all

Defined in:: lib/parse/embeddings/openai.rb

Overview

OpenAI embeddings provider. Wraps POST /v1/embeddings and the text-embedding-3-small, text-embedding-3-large, and legacy text-embedding-ada-002 models.

== Security

The Faraday connection refuses ssl: { verify: false } on the production HTTPS base URL and refuses proxy: unless the caller opts in via allow_faraday_proxy: true. Env-proxy autodiscovery (HTTPS_PROXY etc.) is suppressed by default — same model as Parse::Client.
#inspect (inherited from Provider) never surfaces @api_key.
Authorization, OpenAI-Organization, and OpenAI-Project headers are added to Middleware::BodyBuilder::REDACTED_HEADERS so Faraday logging cannot leak them.

== Errors

All errors inherit from Error:

AuthenticationError — 401 from OpenAI.
RateLimitError — 429 from OpenAI (retried up to max_retries).
BadRequestError — 400/404 (not retried).
TransientError — 5xx or network/timeout (retried).
InvalidResponseError — response shape violates the contract.

Examples:

registration

Parse::Embeddings.register(:openai,
  Parse::Embeddings::OpenAI.new(
    api_key: ENV.fetch("OPENAI_API_KEY"),
    model:   "text-embedding-3-small",
  ))

Defined Under Namespace

Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError

Constant Summary collapse

DEFAULT_BASE_URL =

"https://api.openai.com/v1"

DEFAULT_MODEL =

"text-embedding-3-small"

DEFAULT_TIMEOUT =

DEFAULT_OPEN_TIMEOUT =

DEFAULT_MAX_RETRIES =

DEFAULT_BATCH_SIZE =

MAX_RESPONSE_BYTES = Hard ceiling on the response body we'll parse. A legitimate OpenAI embeddings response for the worst-case configuration (100 inputs × text-embedding-3-large, 3072 floats × ~12 chars per encoded float) is ~3.6 MB. We allow 16 MB to leave generous headroom for usage telemetry and future fields, while still bounding the buffer an adversarial / misconfigured base_url could ship at us before the 30s timeout fires.

16 * 1024 * 1024

MODEL_DEFAULT_DIMENSIONS = Native vector widths for each supported model. text-embedding-3-* also accept a dimensions: parameter that truncates the output (Matryoshka-style) — when set, it overrides the native width.

{
  "text-embedding-3-small" => 1536,
  "text-embedding-3-large" => 3072,
  "text-embedding-ada-002" => 1536,
}.freeze

MODEL_MAX_INPUT_TOKENS = Max input tokens per item for the supported models. Provided as a chunker hint via #max_input_tokens.

{
  "text-embedding-3-small" => 8191,
  "text-embedding-3-large" => 8191,
  "text-embedding-ada-002" => 8191,
}.freeze

Constants inherited from Provider

Provider::AS_NOTIFICATION_NAME

Instance Method Summary collapse

#backoff_seconds(attempt) ⇒ Object protected
Exponential backoff with deterministic ceiling.
#build_connection ⇒ Object protected
Subclass extension points.
#dimensions ⇒ Object
#embed_batch_size ⇒ Object
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Vectors aligned 1:1 with strings.
#extract_vectors!(payload, input_count) ⇒ Object protected
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, organization: nil, project: nil, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ OpenAI constructor
A new instance of OpenAI.
#inspect_attrs ⇒ Object
Override the Provider's safe inspect to add OpenAI-specific non-sensitive attrs.
#max_input_tokens ⇒ Object
#model_name ⇒ Object
#normalize? ⇒ Boolean
#parse_json_body!(body) ⇒ Object protected
#post_embeddings(body) ⇒ Object protected
Single POST with bounded retry.
#retry_after_seconds(response) ⇒ Object protected
#supports_input_type? ⇒ Boolean

Methods inherited from Provider

#embed_image, #embed_text_batched, #inspect, #instrument_embed, #modalities, #validate_response!

Constructor Details

#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, organization: nil, project: nil, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ `OpenAI`

Returns a new instance of OpenAI.

Parameters:

api_key (String) —
required. Sent as Authorization: Bearer ….
model (String) (defaults to: DEFAULT_MODEL) —
one of MODEL_DEFAULT_DIMENSIONS's keys.
dimensions (Integer, nil) (defaults to: nil) —
override output width (3-series only). When nil, uses the model's native dimensions.
base_url (String) (defaults to: DEFAULT_BASE_URL) —
override (Azure / proxy). Must be HTTPS unless allow_insecure_base_url: true.
organization (String, nil) (defaults to: nil) —
sent as OpenAI-Organization.
project (String, nil) (defaults to: nil) —
sent as OpenAI-Project.
timeout (Integer) (defaults to: DEFAULT_TIMEOUT) —
read timeout, seconds.
open_timeout (Integer) (defaults to: DEFAULT_OPEN_TIMEOUT) —
connect timeout, seconds.
max_retries (Integer) (defaults to: DEFAULT_MAX_RETRIES) —
retry attempts on 429/5xx/timeouts.
embed_batch_size (Integer) (defaults to: DEFAULT_BATCH_SIZE) —
inputs per request.
allow_faraday_proxy (Boolean) (defaults to: false) —
opt in to proxy / env-proxy autodiscovery. Defaults false — matches Parse::Client.
allow_insecure_base_url (Boolean) (defaults to: false) —
permit http:// base (local Ollama-shaped proxies). Defaults false.
connection (Faraday::Connection, nil) (defaults to: nil) —
injection seam for tests. When nil, a connection is built from the other options.

# File 'lib/parse/embeddings/openai.rb', line 103

def initialize(
  api_key:,
  model: DEFAULT_MODEL,
  dimensions: nil,
  base_url: DEFAULT_BASE_URL,
  organization: nil,
  project: nil,
  timeout: DEFAULT_TIMEOUT,
  open_timeout: DEFAULT_OPEN_TIMEOUT,
  max_retries: DEFAULT_MAX_RETRIES,
  embed_batch_size: DEFAULT_BATCH_SIZE,
  allow_faraday_proxy: false,
  allow_insecure_base_url: false,
  connection: nil
)
  validate_api_key!(api_key)
  validate_model!(model)
  validate_dimensions!(model, dimensions)
  sanitized_base_url = validate_base_url!(base_url, allow_insecure_base_url)
  validate_positive_integer!(:timeout, timeout)
  validate_positive_integer!(:open_timeout, open_timeout)
  validate_non_negative_integer!(:max_retries, max_retries)
  validate_positive_integer!(:embed_batch_size, embed_batch_size)

  @api_key = api_key
  @model = model
  @dimensions = dimensions || MODEL_DEFAULT_DIMENSIONS.fetch(model)
  @base_url = sanitized_base_url
  @organization = organization
  @project = project
  @timeout = timeout
  @open_timeout = open_timeout
  @max_retries = max_retries
  @embed_batch_size = embed_batch_size
  @allow_faraday_proxy = allow_faraday_proxy
  @connection = connection || build_connection
end

Instance Method Details

#backoff_seconds(attempt) ⇒ `Object` (protected)

Exponential backoff with deterministic ceiling.

NOTE: no jitter. Client#request (lib/parse/client.rb) multiplies its sleep by 0.75 + rand * 0.5 to de-correlate fleet-wide retries. We deliberately omit that here: this provider is intended to be driven by a single rate-limited job runner (Sidekiq throttler, AS::Worker bucket, etc.) that already paces concurrent requests against OpenAI's rate limits. Per-call jitter on top of an external limiter only masks coordination bugs. Operators driving this provider from an unbounded worker pool should add their own jitter (subclass and override) — otherwise a fleet-wide 429 will synchronize the retry storm exponentially.

# File 'lib/parse/embeddings/openai.rb', line 398

def backoff_seconds(attempt)
  # 0.5, 1.0, 2.0, 4.0, 8.0 …  capped at 30s
  [0.5 * (2**(attempt - 1)), 30.0].min
end

#build_connection ⇒ `Object` (protected)

Subclass extension points. Azure/Ollama/Voyage adapters can override these to swap the auth header shape, the URL path, the JSON envelope, or the retry policy without re-implementing the validation layer above.

build_connection — Faraday wiring (override for Azure api-key: header form). post_embeddings — request + retry loop. parse_json_body! — JSON parse + bounded-size check. extract_vectors! — response envelope shape. backoff_seconds — sleep schedule between retries. retry_after_seconds — Retry-After header interpretation.

# File 'lib/parse/embeddings/openai.rb', line 243

def build_connection
  headers = {
    "Authorization" => "Bearer #{@api_key}",
    "Content-Type" => "application/json",
    "Accept" => "application/json",
    "User-Agent" => "parse-stack-embeddings/#{user_agent_version}",
  }
  headers["OpenAI-Organization"] = @organization if @organization
  headers["OpenAI-Project"] = @project if @project

  # Mirror Parse::Client: when proxy is NOT explicitly opted in,
  # pass `proxy: nil` to suppress Faraday's automatic discovery of
  # HTTPS_PROXY / HTTP_PROXY env vars. When opted in, omit the
  # key entirely so Faraday's normal env-discovery runs.
  faraday_opts = { url: @base_url, headers: headers }
  faraday_opts[:proxy] = nil unless @allow_faraday_proxy

  conn = Faraday.new(**faraday_opts) do |f|
    f.options.timeout = @timeout
    f.options.open_timeout = @open_timeout
    f.adapter Faraday.default_adapter
  end
  # Belt-and-suspenders mirroring Parse::Client (see client.rb): Faraday may
  # still synthesise a ProxyOptions from env regardless of the `proxy: nil`
  # we passed in opts, so we re-assert post-construction.
  conn.proxy = nil if !@allow_faraday_proxy && conn.respond_to?(:proxy=)
  conn
end

#dimensions ⇒ `Object`



141
142
143

# File 'lib/parse/embeddings/openai.rb', line 141

def dimensions
  @dimensions
end

#embed_batch_size ⇒ `Object`



149
150
151

# File 'lib/parse/embeddings/openai.rb', line 149

def embed_batch_size
  @embed_batch_size
end

#embed_text(strings, input_type: :search_document) ⇒ `Array<Array<Float>>`

Returns vectors aligned 1:1 with strings.

Parameters:

strings (Array<String>) —
inputs.
input_type (Symbol) (defaults to: :search_document) —
accepted for forward compatibility, ignored at the wire level — OpenAI does not asymmetrize query vs document. The base Provider#embed_text_batched threads the value through; this implementation drops it.

Returns:

(Array<Array<Float>>) —
vectors aligned 1:1 with strings.

# File 'lib/parse/embeddings/openai.rb', line 176

def embed_text(strings, input_type: :search_document)
  unless strings.is_a?(Array)
    raise ArgumentError,
          "Parse::Embeddings::OpenAI#embed_text expects Array<String> (got #{strings.class})."
  end
  return [] if strings.empty?
  strings.each_with_index do |s, i|
    unless s.is_a?(String)
      raise ArgumentError,
            "Parse::Embeddings::OpenAI#embed_text strings[#{i}] is not a String (#{s.class})."
    end
    if s.empty?
      raise ArgumentError,
            "Parse::Embeddings::OpenAI#embed_text strings[#{i}] is empty; OpenAI rejects empty inputs."
    end
  end

  body = { input: strings, model: @model }
  # `dimensions:` is only valid for text-embedding-3-*. Sending it
  # to ada-002 yields a 400. When the caller specified an override
  # we always forward it; when the model is 3-series and we're
  # using the default, we still forward to make the contract
  # explicit (and to assert the server returns what we expect).
  body[:dimensions] = @dimensions if @model.start_with?("text-embedding-3-")

  instrument_embed(strings.length, input_type) do |emit_payload|
    payload = post_embeddings(body)
    # OpenAI's response envelope carries `usage: { prompt_tokens,
    # total_tokens }`. Forward total_tokens (the operator-facing
    # cost number) into the AS::N payload so cost subscribers can
    # budget embedding spend on the same footing as
    # `parse.agent.tool_call` token cost. Defensive on shape — a
    # mock / proxy that strips the usage block must not crash the
    # request path.
    if payload.is_a?(Hash) && payload["usage"].is_a?(Hash)
      tt = payload["usage"]["total_tokens"]
      emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0
    end
    vectors = extract_vectors!(payload, strings.length)
    validate_response!(strings.length, vectors)
  end
end

#extract_vectors!(payload, input_count) ⇒ `Object` (protected)

# File 'lib/parse/embeddings/openai.rb', line 349

def extract_vectors!(payload, input_count)
  unless payload.is_a?(Hash)
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: response body is not a JSON object."
  end
  data = payload["data"]
  unless data.is_a?(Array)
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: response.data is not an Array."
  end
  if data.length != input_count
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: response.data.length #{data.length} != input count #{input_count}."
  end
  # OpenAI documents that `data[].index` reflects request order,
  # but the API spec allows out-of-order responses. Sort defensively.
  sorted = data.each_with_index.map do |entry, i|
    unless entry.is_a?(Hash)
      raise InvalidResponseError,
            "Parse::Embeddings::OpenAI: response.data[#{i}] is not a JSON object."
    end
    idx = entry["index"]
    unless idx.is_a?(Integer) && idx >= 0 && idx < input_count
      raise InvalidResponseError,
            "Parse::Embeddings::OpenAI: response.data[#{i}].index #{idx.inspect} out of range."
    end
    [idx, entry["embedding"]]
  end
  indices = sorted.map(&:first)
  if indices.uniq.length != indices.length
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: duplicate index in response.data."
  end
  sorted.sort_by(&:first).map(&:last)
end

#inspect_attrs ⇒ `Object`

Override the Provider's safe inspect to add OpenAI-specific non-sensitive attrs. @base_url is redacted to host-only because operators may point this provider at an Azure / Ollama endpoint they consider sensitive — the same policy post_embeddings applies when raising on transient errors.



224
225
226

# File 'lib/parse/embeddings/openai.rb', line 224

def inspect_attrs
  super.merge(base: safe_base_host, retries: @max_retries)
end

#max_input_tokens ⇒ `Object`



153
154
155

# File 'lib/parse/embeddings/openai.rb', line 153

def max_input_tokens
  MODEL_MAX_INPUT_TOKENS[@model]
end

#model_name ⇒ `Object`



145
146
147

# File 'lib/parse/embeddings/openai.rb', line 145

def model_name
  @model
end

#normalize? ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/parse/embeddings/openai.rb', line 157

def normalize?
  # OpenAI's text-embedding-3-* and ada-002 all return
  # unit-normalized vectors. Documented in the API reference.
  true
end

#parse_json_body!(body) ⇒ `Object` (protected)

# File 'lib/parse/embeddings/openai.rb', line 327

def parse_json_body!(body)
  # NOTE: we no longer short-circuit on Hash. A pre-parsed Hash
  # from a test adapter bypassed the MAX_RESPONSE_BYTES check
  # AND the max_nesting cap — both defenses against a misbehaving
  # adapter or operator-configured base_url. Tests that want to
  # inject a parsed hash should do so via the `connection:` seam
  # which still runs through Faraday and emits a String body.
  s = body.to_s
  if s.bytesize > MAX_RESPONSE_BYTES
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: response body exceeds #{MAX_RESPONSE_BYTES} bytes " \
          "(#{s.bytesize}). Refusing to parse."
  end
  # `max_nesting:` caps JSON's recursion depth to defend against
  # adversarial payloads on a customer-configured base_url. A
  # well-formed OpenAI response is at most ~5 levels deep.
  JSON.parse(s, max_nesting: 32)
rescue JSON::ParserError => e
  raise InvalidResponseError,
        "Parse::Embeddings::OpenAI: response is not valid JSON (#{e.message})."
end

#post_embeddings(body) ⇒ `Object` (protected)

Single POST with bounded retry. Inline implementation — we don't depend on faraday-retry (not in the runtime gemspec) and the logic is small enough to audit in place.

# File 'lib/parse/embeddings/openai.rb', line 275

def post_embeddings(body)
  attempts = 0
  loop do
    attempts += 1
    begin
      response = @connection.post("embeddings") do |req|
        req.body = body.to_json
      end
    rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e
      # Surface e.class only — Faraday's message often contains
      # the full URL (which may be a customer Azure/Ollama base)
      # and we don't want that flowing into error trackers.
      if attempts > @max_retries
        raise TransientError, "Parse::Embeddings::OpenAI: #{e.class} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end

    status = response.status
    return parse_json_body!(response.body) if status >= 200 && status < 300

    if status == 401
      raise AuthenticationError,
            "Parse::Embeddings::OpenAI: 401 Unauthorized — check api_key."
    end
    if status == 429
      if attempts > @max_retries
        raise RateLimitError,
              "Parse::Embeddings::OpenAI: 429 rate limited after #{attempts} attempt(s)."
      end
      sleep(retry_after_seconds(response) || backoff_seconds(attempts))
      next
    end
    if status >= 500
      if attempts > @max_retries
        raise TransientError,
              "Parse::Embeddings::OpenAI: #{status} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end
    # 4xx other than 401/429 — don't retry. Surface the error
    # without the response body (which may echo input we don't
    # want in error tracking) and without @base_url (which may be
    # a customer-configured Azure/Ollama URL captured by error
    # trackers).
    raise BadRequestError,
          "Parse::Embeddings::OpenAI: #{status} from POST /embeddings."
  end
end

#retry_after_seconds(response) ⇒ `Object` (protected)

# File 'lib/parse/embeddings/openai.rb', line 403

def retry_after_seconds(response)
  ra = response.respond_to?(:headers) ? response.headers["retry-after"] || response.headers["Retry-After"] : nil
  return nil unless ra
  v = ra.to_f
  v.positive? ? [v, 60.0].min : nil
end

#supports_input_type? ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/parse/embeddings/openai.rb', line 163

def supports_input_type?
  # OpenAI does NOT distinguish search_query vs search_document.
  # We accept the kwarg (for cache-key stability across providers)
  # but it does not affect the request payload. See {#embed_text}.
  false
end

Class: Parse::Embeddings::OpenAI

Overview

Examples:

registration

Defined Under Namespace

Constant Summary collapse

Constants inherited from Provider

Instance Method Summary collapse

Methods inherited from Provider

Constructor Details

Instance Method Details

#backoff_seconds(attempt) ⇒ Object (protected)

#build_connection ⇒ Object (protected)

#dimensions ⇒ Object

#embed_batch_size ⇒ Object

#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>

#extract_vectors!(payload, input_count) ⇒ Object (protected)

#inspect_attrs ⇒ Object

#max_input_tokens ⇒ Object

#model_name ⇒ Object

#normalize? ⇒ Boolean

#parse_json_body!(body) ⇒ Object (protected)

#post_embeddings(body) ⇒ Object (protected)

#retry_after_seconds(response) ⇒ Object (protected)

#supports_input_type? ⇒ Boolean

#backoff_seconds(attempt) ⇒ `Object` (protected)

#build_connection ⇒ `Object` (protected)

#dimensions ⇒ `Object`

#embed_batch_size ⇒ `Object`

#embed_text(strings, input_type: :search_document) ⇒ `Array<Array<Float>>`

#extract_vectors!(payload, input_count) ⇒ `Object` (protected)

#inspect_attrs ⇒ `Object`

#max_input_tokens ⇒ `Object`

#model_name ⇒ `Object`

#normalize? ⇒ `Boolean`

#parse_json_body!(body) ⇒ `Object` (protected)

#post_embeddings(body) ⇒ `Object` (protected)

#retry_after_seconds(response) ⇒ `Object` (protected)

#supports_input_type? ⇒ `Boolean`