Class: Parse::Embeddings::OpenAI
- Defined in:
- lib/parse/embeddings/openai.rb
Overview
OpenAI embeddings provider. Wraps POST /v1/embeddings and the
text-embedding-3-small, text-embedding-3-large, and legacy
text-embedding-ada-002 models.
== Security
- The Faraday connection refuses
ssl: { verify: false }on the production HTTPS base URL and refusesproxy:unless the caller opts in viaallow_faraday_proxy: true. Env-proxy autodiscovery (HTTPS_PROXYetc.) is suppressed by default — same model asParse::Client. #inspect(inherited from Provider) never surfaces@api_key.Authorization,OpenAI-Organization, andOpenAI-Projectheaders are added to Middleware::BodyBuilder::REDACTED_HEADERS so Faraday logging cannot leak them.
== Errors
All errors inherit from Error:
- AuthenticationError — 401 from OpenAI.
- RateLimitError — 429 from OpenAI (retried up to
max_retries). - BadRequestError — 400/404 (not retried).
- TransientError — 5xx or network/timeout (retried).
- InvalidResponseError — response shape violates the contract.
Defined Under Namespace
Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError
Constant Summary collapse
- DEFAULT_BASE_URL =
"https://api.openai.com/v1"- DEFAULT_MODEL =
"text-embedding-3-small"- DEFAULT_TIMEOUT =
30- DEFAULT_OPEN_TIMEOUT =
5- DEFAULT_MAX_RETRIES =
3- DEFAULT_BATCH_SIZE =
100- MAX_RESPONSE_BYTES =
Hard ceiling on the response body we'll parse. A legitimate OpenAI embeddings response for the worst-case configuration (100 inputs × text-embedding-3-large, 3072 floats × ~12 chars per encoded float) is ~3.6 MB. We allow 16 MB to leave generous headroom for usage telemetry and future fields, while still bounding the buffer an adversarial / misconfigured base_url could ship at us before the 30s timeout fires.
16 * 1024 * 1024
- MODEL_DEFAULT_DIMENSIONS =
Native vector widths for each supported model.
text-embedding-3-*also accept adimensions:parameter that truncates the output (Matryoshka-style) — when set, it overrides the native width. { "text-embedding-3-small" => 1536, "text-embedding-3-large" => 3072, "text-embedding-ada-002" => 1536, }.freeze
- MODEL_MAX_INPUT_TOKENS =
Max input tokens per item for the supported models. Provided as a chunker hint via #max_input_tokens.
{ "text-embedding-3-small" => 8191, "text-embedding-3-large" => 8191, "text-embedding-ada-002" => 8191, }.freeze
Constants inherited from Provider
Provider::AS_NOTIFICATION_NAME
Instance Method Summary collapse
-
#backoff_seconds(attempt) ⇒ Object
protected
Exponential backoff with deterministic ceiling.
-
#build_connection ⇒ Object
protected
Subclass extension points.
- #dimensions ⇒ Object
- #embed_batch_size ⇒ Object
-
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Vectors aligned 1:1 with
strings. - #extract_vectors!(payload, input_count) ⇒ Object protected
-
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, organization: nil, project: nil, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ OpenAI
constructor
A new instance of OpenAI.
-
#inspect_attrs ⇒ Object
Override the Provider's safe inspect to add OpenAI-specific non-sensitive attrs.
- #max_input_tokens ⇒ Object
- #model_name ⇒ Object
- #normalize? ⇒ Boolean
- #parse_json_body!(body) ⇒ Object protected
-
#post_embeddings(body) ⇒ Object
protected
Single POST with bounded retry.
- #retry_after_seconds(response) ⇒ Object protected
- #supports_input_type? ⇒ Boolean
Methods inherited from Provider
#embed_image, #embed_text_batched, #inspect, #instrument_embed, #modalities, #validate_response!
Constructor Details
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, organization: nil, project: nil, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ OpenAI
Returns a new instance of OpenAI.
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/parse/embeddings/openai.rb', line 103 def initialize( api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, organization: nil, project: nil, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil ) validate_api_key!(api_key) validate_model!(model) validate_dimensions!(model, dimensions) sanitized_base_url = validate_base_url!(base_url, allow_insecure_base_url) validate_positive_integer!(:timeout, timeout) validate_positive_integer!(:open_timeout, open_timeout) validate_non_negative_integer!(:max_retries, max_retries) validate_positive_integer!(:embed_batch_size, ) @api_key = api_key @model = model @dimensions = dimensions || MODEL_DEFAULT_DIMENSIONS.fetch(model) @base_url = sanitized_base_url @organization = organization @project = project @timeout = timeout @open_timeout = open_timeout @max_retries = max_retries @embed_batch_size = @allow_faraday_proxy = allow_faraday_proxy @connection = connection || build_connection end |
Instance Method Details
#backoff_seconds(attempt) ⇒ Object (protected)
Exponential backoff with deterministic ceiling.
NOTE: no jitter. Client#request (lib/parse/client.rb)
multiplies its sleep by 0.75 + rand * 0.5 to de-correlate
fleet-wide retries. We deliberately omit that here: this
provider is intended to be driven by a single rate-limited
job runner (Sidekiq throttler, AS::Worker bucket, etc.) that
already paces concurrent requests against OpenAI's rate
limits. Per-call jitter on top of an external limiter only
masks coordination bugs. Operators driving this provider from
an unbounded worker pool should add their own jitter
(subclass and override) — otherwise a fleet-wide 429 will
synchronize the retry storm exponentially.
398 399 400 401 |
# File 'lib/parse/embeddings/openai.rb', line 398 def backoff_seconds(attempt) # 0.5, 1.0, 2.0, 4.0, 8.0 … capped at 30s [0.5 * (2**(attempt - 1)), 30.0].min end |
#build_connection ⇒ Object (protected)
Subclass extension points. Azure/Ollama/Voyage adapters can override these to swap the auth header shape, the URL path, the JSON envelope, or the retry policy without re-implementing the validation layer above.
build_connection — Faraday wiring (override for Azure
api-key: header form).
post_embeddings — request + retry loop.
parse_json_body! — JSON parse + bounded-size check.
extract_vectors! — response envelope shape.
backoff_seconds — sleep schedule between retries.
retry_after_seconds — Retry-After header interpretation.
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
# File 'lib/parse/embeddings/openai.rb', line 243 def build_connection headers = { "Authorization" => "Bearer #{@api_key}", "Content-Type" => "application/json", "Accept" => "application/json", "User-Agent" => "parse-stack-embeddings/#{user_agent_version}", } headers["OpenAI-Organization"] = @organization if @organization headers["OpenAI-Project"] = @project if @project # Mirror Parse::Client: when proxy is NOT explicitly opted in, # pass `proxy: nil` to suppress Faraday's automatic discovery of # HTTPS_PROXY / HTTP_PROXY env vars. When opted in, omit the # key entirely so Faraday's normal env-discovery runs. faraday_opts = { url: @base_url, headers: headers } faraday_opts[:proxy] = nil unless @allow_faraday_proxy conn = Faraday.new(**faraday_opts) do |f| f..timeout = @timeout f..open_timeout = @open_timeout f.adapter Faraday.default_adapter end # Belt-and-suspenders mirroring Parse::Client (see client.rb): Faraday may # still synthesise a ProxyOptions from env regardless of the `proxy: nil` # we passed in opts, so we re-assert post-construction. conn.proxy = nil if !@allow_faraday_proxy && conn.respond_to?(:proxy=) conn end |
#dimensions ⇒ Object
141 142 143 |
# File 'lib/parse/embeddings/openai.rb', line 141 def dimensions @dimensions end |
#embed_batch_size ⇒ Object
149 150 151 |
# File 'lib/parse/embeddings/openai.rb', line 149 def @embed_batch_size end |
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Returns vectors aligned 1:1 with strings.
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
# File 'lib/parse/embeddings/openai.rb', line 176 def (strings, input_type: :search_document) unless strings.is_a?(Array) raise ArgumentError, "Parse::Embeddings::OpenAI#embed_text expects Array<String> (got #{strings.class})." end return [] if strings.empty? strings.each_with_index do |s, i| unless s.is_a?(String) raise ArgumentError, "Parse::Embeddings::OpenAI#embed_text strings[#{i}] is not a String (#{s.class})." end if s.empty? raise ArgumentError, "Parse::Embeddings::OpenAI#embed_text strings[#{i}] is empty; OpenAI rejects empty inputs." end end body = { input: strings, model: @model } # `dimensions:` is only valid for text-embedding-3-*. Sending it # to ada-002 yields a 400. When the caller specified an override # we always forward it; when the model is 3-series and we're # using the default, we still forward to make the contract # explicit (and to assert the server returns what we expect). body[:dimensions] = @dimensions if @model.start_with?("text-embedding-3-") (strings.length, input_type) do |emit_payload| payload = (body) # OpenAI's response envelope carries `usage: { prompt_tokens, # total_tokens }`. Forward total_tokens (the operator-facing # cost number) into the AS::N payload so cost subscribers can # budget embedding spend on the same footing as # `parse.agent.tool_call` token cost. Defensive on shape — a # mock / proxy that strips the usage block must not crash the # request path. if payload.is_a?(Hash) && payload["usage"].is_a?(Hash) tt = payload["usage"]["total_tokens"] emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0 end vectors = extract_vectors!(payload, strings.length) validate_response!(strings.length, vectors) end end |
#extract_vectors!(payload, input_count) ⇒ Object (protected)
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 |
# File 'lib/parse/embeddings/openai.rb', line 349 def extract_vectors!(payload, input_count) unless payload.is_a?(Hash) raise InvalidResponseError, "Parse::Embeddings::OpenAI: response body is not a JSON object." end data = payload["data"] unless data.is_a?(Array) raise InvalidResponseError, "Parse::Embeddings::OpenAI: response.data is not an Array." end if data.length != input_count raise InvalidResponseError, "Parse::Embeddings::OpenAI: response.data.length #{data.length} != input count #{input_count}." end # OpenAI documents that `data[].index` reflects request order, # but the API spec allows out-of-order responses. Sort defensively. sorted = data.each_with_index.map do |entry, i| unless entry.is_a?(Hash) raise InvalidResponseError, "Parse::Embeddings::OpenAI: response.data[#{i}] is not a JSON object." end idx = entry["index"] unless idx.is_a?(Integer) && idx >= 0 && idx < input_count raise InvalidResponseError, "Parse::Embeddings::OpenAI: response.data[#{i}].index #{idx.inspect} out of range." end [idx, entry["embedding"]] end indices = sorted.map(&:first) if indices.uniq.length != indices.length raise InvalidResponseError, "Parse::Embeddings::OpenAI: duplicate index in response.data." end sorted.sort_by(&:first).map(&:last) end |
#inspect_attrs ⇒ Object
Override the Provider's safe inspect to add OpenAI-specific
non-sensitive attrs. @base_url is redacted to host-only
because operators may point this provider at an Azure / Ollama
endpoint they consider sensitive — the same policy
post_embeddings applies when raising on transient errors.
224 225 226 |
# File 'lib/parse/embeddings/openai.rb', line 224 def inspect_attrs super.merge(base: safe_base_host, retries: @max_retries) end |
#max_input_tokens ⇒ Object
153 154 155 |
# File 'lib/parse/embeddings/openai.rb', line 153 def max_input_tokens MODEL_MAX_INPUT_TOKENS[@model] end |
#model_name ⇒ Object
145 146 147 |
# File 'lib/parse/embeddings/openai.rb', line 145 def model_name @model end |
#normalize? ⇒ Boolean
157 158 159 160 161 |
# File 'lib/parse/embeddings/openai.rb', line 157 def normalize? # OpenAI's text-embedding-3-* and ada-002 all return # unit-normalized vectors. Documented in the API reference. true end |
#parse_json_body!(body) ⇒ Object (protected)
327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 |
# File 'lib/parse/embeddings/openai.rb', line 327 def parse_json_body!(body) # NOTE: we no longer short-circuit on Hash. A pre-parsed Hash # from a test adapter bypassed the MAX_RESPONSE_BYTES check # AND the max_nesting cap — both defenses against a misbehaving # adapter or operator-configured base_url. Tests that want to # inject a parsed hash should do so via the `connection:` seam # which still runs through Faraday and emits a String body. s = body.to_s if s.bytesize > MAX_RESPONSE_BYTES raise InvalidResponseError, "Parse::Embeddings::OpenAI: response body exceeds #{MAX_RESPONSE_BYTES} bytes " \ "(#{s.bytesize}). Refusing to parse." end # `max_nesting:` caps JSON's recursion depth to defend against # adversarial payloads on a customer-configured base_url. A # well-formed OpenAI response is at most ~5 levels deep. JSON.parse(s, max_nesting: 32) rescue JSON::ParserError => e raise InvalidResponseError, "Parse::Embeddings::OpenAI: response is not valid JSON (#{e.})." end |
#post_embeddings(body) ⇒ Object (protected)
Single POST with bounded retry. Inline implementation — we don't depend on faraday-retry (not in the runtime gemspec) and the logic is small enough to audit in place.
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 |
# File 'lib/parse/embeddings/openai.rb', line 275 def (body) attempts = 0 loop do attempts += 1 begin response = @connection.post("embeddings") do |req| req.body = body.to_json end rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e # Surface e.class only — Faraday's message often contains # the full URL (which may be a customer Azure/Ollama base) # and we don't want that flowing into error trackers. if attempts > @max_retries raise TransientError, "Parse::Embeddings::OpenAI: #{e.class} after #{attempts} attempt(s)." end sleep(backoff_seconds(attempts)) next end status = response.status return parse_json_body!(response.body) if status >= 200 && status < 300 if status == 401 raise AuthenticationError, "Parse::Embeddings::OpenAI: 401 Unauthorized — check api_key." end if status == 429 if attempts > @max_retries raise RateLimitError, "Parse::Embeddings::OpenAI: 429 rate limited after #{attempts} attempt(s)." end sleep(retry_after_seconds(response) || backoff_seconds(attempts)) next end if status >= 500 if attempts > @max_retries raise TransientError, "Parse::Embeddings::OpenAI: #{status} after #{attempts} attempt(s)." end sleep(backoff_seconds(attempts)) next end # 4xx other than 401/429 — don't retry. Surface the error # without the response body (which may echo input we don't # want in error tracking) and without @base_url (which may be # a customer-configured Azure/Ollama URL captured by error # trackers). raise BadRequestError, "Parse::Embeddings::OpenAI: #{status} from POST /embeddings." end end |
#retry_after_seconds(response) ⇒ Object (protected)
403 404 405 406 407 408 |
# File 'lib/parse/embeddings/openai.rb', line 403 def retry_after_seconds(response) ra = response.respond_to?(:headers) ? response.headers["retry-after"] || response.headers["Retry-After"] : nil return nil unless ra v = ra.to_f v.positive? ? [v, 60.0].min : nil end |
#supports_input_type? ⇒ Boolean
163 164 165 166 167 168 |
# File 'lib/parse/embeddings/openai.rb', line 163 def supports_input_type? # OpenAI does NOT distinguish search_query vs search_document. # We accept the kwarg (for cache-key stability across providers) # but it does not affect the request payload. See {#embed_text}. false end |