Class: Parse::Embeddings::OpenAI

Inherits:
Provider
  • Object
show all
Defined in:
lib/parse/embeddings/openai.rb

Overview

OpenAI embeddings provider. Wraps POST /v1/embeddings and the text-embedding-3-small, text-embedding-3-large, and legacy text-embedding-ada-002 models.

== Security

  • The Faraday connection refuses ssl: { verify: false } on the production HTTPS base URL and refuses proxy: unless the caller opts in via allow_faraday_proxy: true. Env-proxy autodiscovery (HTTPS_PROXY etc.) is suppressed by default — same model as Parse::Client.
  • #inspect (inherited from Provider) never surfaces @api_key.
  • Authorization, OpenAI-Organization, and OpenAI-Project headers are added to Middleware::BodyBuilder::REDACTED_HEADERS so Faraday logging cannot leak them.

== Errors

All errors inherit from Error:

Examples:

registration

Parse::Embeddings.register(:openai,
  Parse::Embeddings::OpenAI.new(
    api_key: ENV.fetch("OPENAI_API_KEY"),
    model:   "text-embedding-3-small",
  ))

Defined Under Namespace

Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError

Constant Summary collapse

DEFAULT_BASE_URL =
"https://api.openai.com/v1"
DEFAULT_MODEL =
"text-embedding-3-small"
DEFAULT_TIMEOUT =
30
DEFAULT_OPEN_TIMEOUT =
5
DEFAULT_MAX_RETRIES =
3
DEFAULT_BATCH_SIZE =
100
MAX_RESPONSE_BYTES =

Hard ceiling on the response body we'll parse. A legitimate OpenAI embeddings response for the worst-case configuration (100 inputs × text-embedding-3-large, 3072 floats × ~12 chars per encoded float) is ~3.6 MB. We allow 16 MB to leave generous headroom for usage telemetry and future fields, while still bounding the buffer an adversarial / misconfigured base_url could ship at us before the 30s timeout fires.

16 * 1024 * 1024
MODEL_DEFAULT_DIMENSIONS =

Native vector widths for each supported model. text-embedding-3-* also accept a dimensions: parameter that truncates the output (Matryoshka-style) — when set, it overrides the native width.

{
  "text-embedding-3-small" => 1536,
  "text-embedding-3-large" => 3072,
  "text-embedding-ada-002" => 1536,
}.freeze
MODEL_MAX_INPUT_TOKENS =

Max input tokens per item for the supported models. Provided as a chunker hint via #max_input_tokens.

{
  "text-embedding-3-small" => 8191,
  "text-embedding-3-large" => 8191,
  "text-embedding-ada-002" => 8191,
}.freeze

Constants inherited from Provider

Provider::AS_NOTIFICATION_NAME

Instance Method Summary collapse

Methods inherited from Provider

#embed_image, #embed_text_batched, #inspect, #instrument_embed, #modalities, #validate_response!

Constructor Details

#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, organization: nil, project: nil, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ OpenAI

Returns a new instance of OpenAI.

Parameters:

  • api_key (String)

    required. Sent as Authorization: Bearer ….

  • model (String) (defaults to: DEFAULT_MODEL)

    one of MODEL_DEFAULT_DIMENSIONS's keys.

  • dimensions (Integer, nil) (defaults to: nil)

    override output width (3-series only). When nil, uses the model's native dimensions.

  • base_url (String) (defaults to: DEFAULT_BASE_URL)

    override (Azure / proxy). Must be HTTPS unless allow_insecure_base_url: true.

  • organization (String, nil) (defaults to: nil)

    sent as OpenAI-Organization.

  • project (String, nil) (defaults to: nil)

    sent as OpenAI-Project.

  • timeout (Integer) (defaults to: DEFAULT_TIMEOUT)

    read timeout, seconds.

  • open_timeout (Integer) (defaults to: DEFAULT_OPEN_TIMEOUT)

    connect timeout, seconds.

  • max_retries (Integer) (defaults to: DEFAULT_MAX_RETRIES)

    retry attempts on 429/5xx/timeouts.

  • embed_batch_size (Integer) (defaults to: DEFAULT_BATCH_SIZE)

    inputs per request.

  • allow_faraday_proxy (Boolean) (defaults to: false)

    opt in to proxy / env-proxy autodiscovery. Defaults false — matches Parse::Client.

  • allow_insecure_base_url (Boolean) (defaults to: false)

    permit http:// base (local Ollama-shaped proxies). Defaults false.

  • connection (Faraday::Connection, nil) (defaults to: nil)

    injection seam for tests. When nil, a connection is built from the other options.



103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/parse/embeddings/openai.rb', line 103

def initialize(
  api_key:,
  model: DEFAULT_MODEL,
  dimensions: nil,
  base_url: DEFAULT_BASE_URL,
  organization: nil,
  project: nil,
  timeout: DEFAULT_TIMEOUT,
  open_timeout: DEFAULT_OPEN_TIMEOUT,
  max_retries: DEFAULT_MAX_RETRIES,
  embed_batch_size: DEFAULT_BATCH_SIZE,
  allow_faraday_proxy: false,
  allow_insecure_base_url: false,
  connection: nil
)
  validate_api_key!(api_key)
  validate_model!(model)
  validate_dimensions!(model, dimensions)
  sanitized_base_url = validate_base_url!(base_url, allow_insecure_base_url)
  validate_positive_integer!(:timeout, timeout)
  validate_positive_integer!(:open_timeout, open_timeout)
  validate_non_negative_integer!(:max_retries, max_retries)
  validate_positive_integer!(:embed_batch_size, embed_batch_size)

  @api_key = api_key
  @model = model
  @dimensions = dimensions || MODEL_DEFAULT_DIMENSIONS.fetch(model)
  @base_url = sanitized_base_url
  @organization = organization
  @project = project
  @timeout = timeout
  @open_timeout = open_timeout
  @max_retries = max_retries
  @embed_batch_size = embed_batch_size
  @allow_faraday_proxy = allow_faraday_proxy
  @connection = connection || build_connection
end

Instance Method Details

#backoff_seconds(attempt) ⇒ Object (protected)

Exponential backoff with deterministic ceiling.

NOTE: no jitter. Client#request (lib/parse/client.rb) multiplies its sleep by 0.75 + rand * 0.5 to de-correlate fleet-wide retries. We deliberately omit that here: this provider is intended to be driven by a single rate-limited job runner (Sidekiq throttler, AS::Worker bucket, etc.) that already paces concurrent requests against OpenAI's rate limits. Per-call jitter on top of an external limiter only masks coordination bugs. Operators driving this provider from an unbounded worker pool should add their own jitter (subclass and override) — otherwise a fleet-wide 429 will synchronize the retry storm exponentially.



398
399
400
401
# File 'lib/parse/embeddings/openai.rb', line 398

def backoff_seconds(attempt)
  # 0.5, 1.0, 2.0, 4.0, 8.0 …  capped at 30s
  [0.5 * (2**(attempt - 1)), 30.0].min
end

#build_connectionObject (protected)

Subclass extension points. Azure/Ollama/Voyage adapters can override these to swap the auth header shape, the URL path, the JSON envelope, or the retry policy without re-implementing the validation layer above.

build_connection — Faraday wiring (override for Azure api-key: header form). post_embeddings — request + retry loop. parse_json_body! — JSON parse + bounded-size check. extract_vectors! — response envelope shape. backoff_seconds — sleep schedule between retries. retry_after_seconds — Retry-After header interpretation.



243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
# File 'lib/parse/embeddings/openai.rb', line 243

def build_connection
  headers = {
    "Authorization" => "Bearer #{@api_key}",
    "Content-Type" => "application/json",
    "Accept" => "application/json",
    "User-Agent" => "parse-stack-embeddings/#{user_agent_version}",
  }
  headers["OpenAI-Organization"] = @organization if @organization
  headers["OpenAI-Project"] = @project if @project

  # Mirror Parse::Client: when proxy is NOT explicitly opted in,
  # pass `proxy: nil` to suppress Faraday's automatic discovery of
  # HTTPS_PROXY / HTTP_PROXY env vars. When opted in, omit the
  # key entirely so Faraday's normal env-discovery runs.
  faraday_opts = { url: @base_url, headers: headers }
  faraday_opts[:proxy] = nil unless @allow_faraday_proxy

  conn = Faraday.new(**faraday_opts) do |f|
    f.options.timeout = @timeout
    f.options.open_timeout = @open_timeout
    f.adapter Faraday.default_adapter
  end
  # Belt-and-suspenders mirroring Parse::Client (see client.rb): Faraday may
  # still synthesise a ProxyOptions from env regardless of the `proxy: nil`
  # we passed in opts, so we re-assert post-construction.
  conn.proxy = nil if !@allow_faraday_proxy && conn.respond_to?(:proxy=)
  conn
end

#dimensionsObject



141
142
143
# File 'lib/parse/embeddings/openai.rb', line 141

def dimensions
  @dimensions
end

#embed_batch_sizeObject



149
150
151
# File 'lib/parse/embeddings/openai.rb', line 149

def embed_batch_size
  @embed_batch_size
end

#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>

Returns vectors aligned 1:1 with strings.

Parameters:

  • strings (Array<String>)

    inputs.

  • input_type (Symbol) (defaults to: :search_document)

    accepted for forward compatibility, ignored at the wire level — OpenAI does not asymmetrize query vs document. The base Provider#embed_text_batched threads the value through; this implementation drops it.

Returns:

  • (Array<Array<Float>>)

    vectors aligned 1:1 with strings.



176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
# File 'lib/parse/embeddings/openai.rb', line 176

def embed_text(strings, input_type: :search_document)
  unless strings.is_a?(Array)
    raise ArgumentError,
          "Parse::Embeddings::OpenAI#embed_text expects Array<String> (got #{strings.class})."
  end
  return [] if strings.empty?
  strings.each_with_index do |s, i|
    unless s.is_a?(String)
      raise ArgumentError,
            "Parse::Embeddings::OpenAI#embed_text strings[#{i}] is not a String (#{s.class})."
    end
    if s.empty?
      raise ArgumentError,
            "Parse::Embeddings::OpenAI#embed_text strings[#{i}] is empty; OpenAI rejects empty inputs."
    end
  end

  body = { input: strings, model: @model }
  # `dimensions:` is only valid for text-embedding-3-*. Sending it
  # to ada-002 yields a 400. When the caller specified an override
  # we always forward it; when the model is 3-series and we're
  # using the default, we still forward to make the contract
  # explicit (and to assert the server returns what we expect).
  body[:dimensions] = @dimensions if @model.start_with?("text-embedding-3-")

  instrument_embed(strings.length, input_type) do |emit_payload|
    payload = post_embeddings(body)
    # OpenAI's response envelope carries `usage: { prompt_tokens,
    # total_tokens }`. Forward total_tokens (the operator-facing
    # cost number) into the AS::N payload so cost subscribers can
    # budget embedding spend on the same footing as
    # `parse.agent.tool_call` token cost. Defensive on shape — a
    # mock / proxy that strips the usage block must not crash the
    # request path.
    if payload.is_a?(Hash) && payload["usage"].is_a?(Hash)
      tt = payload["usage"]["total_tokens"]
      emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0
    end
    vectors = extract_vectors!(payload, strings.length)
    validate_response!(strings.length, vectors)
  end
end

#extract_vectors!(payload, input_count) ⇒ Object (protected)



349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
# File 'lib/parse/embeddings/openai.rb', line 349

def extract_vectors!(payload, input_count)
  unless payload.is_a?(Hash)
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: response body is not a JSON object."
  end
  data = payload["data"]
  unless data.is_a?(Array)
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: response.data is not an Array."
  end
  if data.length != input_count
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: response.data.length #{data.length} != input count #{input_count}."
  end
  # OpenAI documents that `data[].index` reflects request order,
  # but the API spec allows out-of-order responses. Sort defensively.
  sorted = data.each_with_index.map do |entry, i|
    unless entry.is_a?(Hash)
      raise InvalidResponseError,
            "Parse::Embeddings::OpenAI: response.data[#{i}] is not a JSON object."
    end
    idx = entry["index"]
    unless idx.is_a?(Integer) && idx >= 0 && idx < input_count
      raise InvalidResponseError,
            "Parse::Embeddings::OpenAI: response.data[#{i}].index #{idx.inspect} out of range."
    end
    [idx, entry["embedding"]]
  end
  indices = sorted.map(&:first)
  if indices.uniq.length != indices.length
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: duplicate index in response.data."
  end
  sorted.sort_by(&:first).map(&:last)
end

#inspect_attrsObject

Override the Provider's safe inspect to add OpenAI-specific non-sensitive attrs. @base_url is redacted to host-only because operators may point this provider at an Azure / Ollama endpoint they consider sensitive — the same policy post_embeddings applies when raising on transient errors.



224
225
226
# File 'lib/parse/embeddings/openai.rb', line 224

def inspect_attrs
  super.merge(base: safe_base_host, retries: @max_retries)
end

#max_input_tokensObject



153
154
155
# File 'lib/parse/embeddings/openai.rb', line 153

def max_input_tokens
  MODEL_MAX_INPUT_TOKENS[@model]
end

#model_nameObject



145
146
147
# File 'lib/parse/embeddings/openai.rb', line 145

def model_name
  @model
end

#normalize?Boolean

Returns:

  • (Boolean)


157
158
159
160
161
# File 'lib/parse/embeddings/openai.rb', line 157

def normalize?
  # OpenAI's text-embedding-3-* and ada-002 all return
  # unit-normalized vectors. Documented in the API reference.
  true
end

#parse_json_body!(body) ⇒ Object (protected)



327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
# File 'lib/parse/embeddings/openai.rb', line 327

def parse_json_body!(body)
  # NOTE: we no longer short-circuit on Hash. A pre-parsed Hash
  # from a test adapter bypassed the MAX_RESPONSE_BYTES check
  # AND the max_nesting cap — both defenses against a misbehaving
  # adapter or operator-configured base_url. Tests that want to
  # inject a parsed hash should do so via the `connection:` seam
  # which still runs through Faraday and emits a String body.
  s = body.to_s
  if s.bytesize > MAX_RESPONSE_BYTES
    raise InvalidResponseError,
          "Parse::Embeddings::OpenAI: response body exceeds #{MAX_RESPONSE_BYTES} bytes " \
          "(#{s.bytesize}). Refusing to parse."
  end
  # `max_nesting:` caps JSON's recursion depth to defend against
  # adversarial payloads on a customer-configured base_url. A
  # well-formed OpenAI response is at most ~5 levels deep.
  JSON.parse(s, max_nesting: 32)
rescue JSON::ParserError => e
  raise InvalidResponseError,
        "Parse::Embeddings::OpenAI: response is not valid JSON (#{e.message})."
end

#post_embeddings(body) ⇒ Object (protected)

Single POST with bounded retry. Inline implementation — we don't depend on faraday-retry (not in the runtime gemspec) and the logic is small enough to audit in place.



275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# File 'lib/parse/embeddings/openai.rb', line 275

def post_embeddings(body)
  attempts = 0
  loop do
    attempts += 1
    begin
      response = @connection.post("embeddings") do |req|
        req.body = body.to_json
      end
    rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e
      # Surface e.class only — Faraday's message often contains
      # the full URL (which may be a customer Azure/Ollama base)
      # and we don't want that flowing into error trackers.
      if attempts > @max_retries
        raise TransientError, "Parse::Embeddings::OpenAI: #{e.class} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end

    status = response.status
    return parse_json_body!(response.body) if status >= 200 && status < 300

    if status == 401
      raise AuthenticationError,
            "Parse::Embeddings::OpenAI: 401 Unauthorized — check api_key."
    end
    if status == 429
      if attempts > @max_retries
        raise RateLimitError,
              "Parse::Embeddings::OpenAI: 429 rate limited after #{attempts} attempt(s)."
      end
      sleep(retry_after_seconds(response) || backoff_seconds(attempts))
      next
    end
    if status >= 500
      if attempts > @max_retries
        raise TransientError,
              "Parse::Embeddings::OpenAI: #{status} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end
    # 4xx other than 401/429 — don't retry. Surface the error
    # without the response body (which may echo input we don't
    # want in error tracking) and without @base_url (which may be
    # a customer-configured Azure/Ollama URL captured by error
    # trackers).
    raise BadRequestError,
          "Parse::Embeddings::OpenAI: #{status} from POST /embeddings."
  end
end

#retry_after_seconds(response) ⇒ Object (protected)



403
404
405
406
407
408
# File 'lib/parse/embeddings/openai.rb', line 403

def retry_after_seconds(response)
  ra = response.respond_to?(:headers) ? response.headers["retry-after"] || response.headers["Retry-After"] : nil
  return nil unless ra
  v = ra.to_f
  v.positive? ? [v, 60.0].min : nil
end

#supports_input_type?Boolean

Returns:

  • (Boolean)


163
164
165
166
167
168
# File 'lib/parse/embeddings/openai.rb', line 163

def supports_input_type?
  # OpenAI does NOT distinguish search_query vs search_document.
  # We accept the kwarg (for cache-key stability across providers)
  # but it does not affect the request payload. See {#embed_text}.
  false
end