Class: Parse::Embeddings::LocalHTTP

Inherits:
Provider
  • Object
show all
Defined in:
lib/parse/embeddings/local_http.rb

Overview

Generic OpenAI-compatible local embedding provider. Talks to any server that exposes POST <base_url>/embeddings with the OpenAI request/response shape — covers Ollama (/v1), LM Studio (/v1), vLLM, llama.cpp's server, and any reverse-proxy that translates to a local model runner.

== SSRF gate

The base_url is resolved at construction time and the resolved addresses are checked against File::BLOCKED_CIDRS (loopback, RFC1918, link-local, cloud-metadata, CGNAT, IPv6 ULA, …). When ANY resolved address falls in a private/internal range, the constructor refuses unless the caller opts in via allow_private_endpoint: true.

The opt-in is a deliberate, audit-able gate — Parse::Embeddings registration is configuration code, not user input, so opting in to "yes, this base_url really is my Ollama on localhost" is a one-line decision by the operator at boot time. A Kernel#warn fires when the opt-in is taken so the choice shows up in operator logs / bundle exec rake about output.

http:// base URLs are accepted with allow_private_endpoint: true (the typical local-runner deployment), and refused otherwise unless the caller also passes allow_insecure_base_url: true (escape hatch for self-signed internal HTTPS proxies fronted by http://).

== Why no fixed model whitelist

Ollama, LM Studio, and vLLM all serve operator-chosen models — we cannot enumerate "supported" models the way OpenAI can. The constructor instead takes the dimensions: explicitly, and the provider's Provider#validate_response! (inherited) enforces that every returned vector matches that width. Mis-specified dimensions surface as InvalidResponseError on the first embed call.

== Security

  • Configure-time SSRF gate (above).
  • The Faraday connection refuses proxy: unless the caller opts in via allow_faraday_proxy: true. Env-proxy autodiscovery is suppressed by default — same model as OpenAI.
  • #inspect (inherited from Provider) never surfaces @api_key.

Examples:

Ollama on the same host

Parse::Embeddings.register(:ollama,
  Parse::Embeddings::LocalHTTP.new(
    base_url: "http://localhost:11434/v1",
    model: "nomic-embed-text",
    dimensions: 768,
    allow_private_endpoint: true,
  ))

public OpenAI-compatible proxy (e.g. internal gateway on a public DNS name)

Parse::Embeddings.register(:gateway,
  Parse::Embeddings::LocalHTTP.new(
    base_url: "https://embeddings.example.com/v1",
    api_key:  ENV.fetch("GATEWAY_API_KEY"),
    model:    "bge-small-en-v1.5",
    dimensions: 384,
  ))

Defined Under Namespace

Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError

Constant Summary collapse

DEFAULT_TIMEOUT =
30
DEFAULT_OPEN_TIMEOUT =
5
DEFAULT_MAX_RETRIES =
3
DEFAULT_BATCH_SIZE =
32
MAX_RESPONSE_BYTES =
16 * 1024 * 1024

Constants inherited from Provider

Provider::AS_NOTIFICATION_NAME

Instance Method Summary collapse

Methods inherited from Provider

#embed_image, #embed_text_batched, #inspect, #instrument_embed, #max_input_tokens, #modalities, #validate_response!

Constructor Details

#initialize(base_url:, model:, dimensions:, api_key: nil, normalize: false, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_private_endpoint: false, allow_insecure_base_url: false, allow_faraday_proxy: false, connection: nil) ⇒ LocalHTTP

Returns a new instance of LocalHTTP.

Parameters:

  • base_url (String)

    required. Must be http(s):// with a host.

  • model (String)

    required. Identifier the local server expects in the model request field. Persisted to embedding_meta.

  • dimensions (Integer)

    required. Width of vectors the local model produces. Enforced by Provider#validate_response!.

  • api_key (String, nil) (defaults to: nil)

    optional. When present, sent as Authorization: Bearer …. Local runners typically accept any value or no header.

  • normalize (Boolean) (defaults to: false)

    whether the local model returns unit-normalized vectors. Defaults to false (Ollama and most local models do NOT normalize; bge-* and OpenAI do). Affects similarity metric selection downstream.

  • timeout (Integer) (defaults to: DEFAULT_TIMEOUT)

    read timeout, seconds.

  • open_timeout (Integer) (defaults to: DEFAULT_OPEN_TIMEOUT)

    connect timeout, seconds.

  • max_retries (Integer) (defaults to: DEFAULT_MAX_RETRIES)

    retry attempts on 429/5xx/timeouts.

  • embed_batch_size (Integer) (defaults to: DEFAULT_BATCH_SIZE)

    inputs per request.

  • allow_private_endpoint (Boolean) (defaults to: false)

    required when base_url resolves to a private/internal/loopback address. Defaults false; opting in emits a one-time warning per provider instance.

  • allow_insecure_base_url (Boolean) (defaults to: false)

    permit http:// for PUBLIC base URLs. Defaults false. Independent of allow_private_endpoint (which already implies http:// is fine for the local case).

  • allow_faraday_proxy (Boolean) (defaults to: false)

    opt in to proxy / env-proxy autodiscovery. Defaults false.

  • connection (Faraday::Connection, nil) (defaults to: nil)

    injection seam.



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# File 'lib/parse/embeddings/local_http.rb', line 114

def initialize(
  base_url:,
  model:,
  dimensions:,
  api_key: nil,
  normalize: false,
  timeout: DEFAULT_TIMEOUT,
  open_timeout: DEFAULT_OPEN_TIMEOUT,
  max_retries: DEFAULT_MAX_RETRIES,
  embed_batch_size: DEFAULT_BATCH_SIZE,
  allow_private_endpoint: false,
  allow_insecure_base_url: false,
  allow_faraday_proxy: false,
  connection: nil
)
  validate_model!(model)
  validate_dimensions!(dimensions)
  validate_optional_api_key!(api_key)
  unless [true, false].include?(normalize)
    raise ArgumentError,
          "Parse::Embeddings::LocalHTTP: normalize must be true or false (got #{normalize.inspect})."
  end
  validate_positive_integer!(:timeout, timeout)
  validate_positive_integer!(:open_timeout, open_timeout)
  validate_non_negative_integer!(:max_retries, max_retries)
  validate_positive_integer!(:embed_batch_size, embed_batch_size)

  sanitized_base_url, resolved_addrs, is_private =
    validate_base_url_and_gate_ssrf!(base_url,
                                     allow_private_endpoint: allow_private_endpoint,
                                     allow_insecure_base_url: allow_insecure_base_url)
  if is_private
    # Audit log. Emits once per instance — Kernel#warn so it lands
    # on stderr and any logger that captures it. Operators running
    # a hardened environment can grep this to confirm every
    # private-endpoint opt-in was intentional.
    warn "Parse::Embeddings::LocalHTTP: allow_private_endpoint=true for #{sanitized_base_url}" \
         "resolved to private address(es) #{resolved_addrs.map(&:to_s).inspect}."
  end

  @base_url = sanitized_base_url
  @model = model
  @dimensions = dimensions
  @api_key = api_key
  @normalize = normalize
  @timeout = timeout
  @open_timeout = open_timeout
  @max_retries = max_retries
  @embed_batch_size = embed_batch_size
  @allow_faraday_proxy = allow_faraday_proxy
  @connection = connection || build_connection
end

Instance Method Details

#backoff_seconds(attempt) ⇒ Object (protected)



355
356
357
# File 'lib/parse/embeddings/local_http.rb', line 355

def backoff_seconds(attempt)
  [0.5 * (2**(attempt - 1)), 30.0].min
end

#build_connectionObject (protected)



234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
# File 'lib/parse/embeddings/local_http.rb', line 234

def build_connection
  headers = {
    "Content-Type" => "application/json",
    "Accept" => "application/json",
    "User-Agent" => "parse-stack-embeddings/#{user_agent_version}",
  }
  headers["Authorization"] = "Bearer #{@api_key}" if @api_key

  faraday_opts = { url: @base_url, headers: headers }
  faraday_opts[:proxy] = nil unless @allow_faraday_proxy

  conn = Faraday.new(**faraday_opts) do |f|
    f.options.timeout = @timeout
    f.options.open_timeout = @open_timeout
    f.adapter Faraday.default_adapter
  end
  conn.proxy = nil if !@allow_faraday_proxy && conn.respond_to?(:proxy=)
  conn
end

#dimensionsObject



167
168
169
# File 'lib/parse/embeddings/local_http.rb', line 167

def dimensions
  @dimensions
end

#embed_batch_sizeObject



175
176
177
# File 'lib/parse/embeddings/local_http.rb', line 175

def embed_batch_size
  @embed_batch_size
end

#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>

Returns vectors aligned 1:1 with strings.

Parameters:

  • strings (Array<String>)

    inputs.

  • input_type (Symbol) (defaults to: :search_document)

    accepted for forward compatibility, ignored at the wire level.

Returns:

  • (Array<Array<Float>>)

    vectors aligned 1:1 with strings.



196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
# File 'lib/parse/embeddings/local_http.rb', line 196

def embed_text(strings, input_type: :search_document)
  unless strings.is_a?(Array)
    raise ArgumentError,
          "Parse::Embeddings::LocalHTTP#embed_text expects Array<String> (got #{strings.class})."
  end
  return [] if strings.empty?
  strings.each_with_index do |s, i|
    unless s.is_a?(String)
      raise ArgumentError,
            "Parse::Embeddings::LocalHTTP#embed_text strings[#{i}] is not a String (#{s.class})."
    end
    if s.empty?
      raise ArgumentError,
            "Parse::Embeddings::LocalHTTP#embed_text strings[#{i}] is empty; local runners typically reject empty inputs."
    end
  end

  body = { input: strings, model: @model }

  instrument_embed(strings.length, input_type) do |emit_payload|
    payload = post_embeddings(body)
    # Local runners may or may not include `usage`. When present,
    # forward total_tokens to the AS::N payload.
    if payload.is_a?(Hash) && payload["usage"].is_a?(Hash)
      tt = payload["usage"]["total_tokens"]
      emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0
    end
    vectors = extract_vectors!(payload, strings.length)
    validate_response!(strings.length, vectors)
  end
end

#extract_vectors!(payload, input_count) ⇒ Object (protected)

Accept the OpenAI-compatible shape. Some local runners omit index or return data in request order without it; tolerate both forms by falling back to positional alignment when the field is missing across the entire response.



315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
# File 'lib/parse/embeddings/local_http.rb', line 315

def extract_vectors!(payload, input_count)
  unless payload.is_a?(Hash)
    raise InvalidResponseError,
          "Parse::Embeddings::LocalHTTP: response body is not a JSON object."
  end
  data = payload["data"]
  unless data.is_a?(Array)
    raise InvalidResponseError,
          "Parse::Embeddings::LocalHTTP: response.data is not an Array."
  end
  if data.length != input_count
    raise InvalidResponseError,
          "Parse::Embeddings::LocalHTTP: response.data.length #{data.length} != input count #{input_count}."
  end
  all_have_index = data.all? { |e| e.is_a?(Hash) && e["index"].is_a?(Integer) }
  if all_have_index
    sorted = data.map do |entry|
      idx = entry["index"]
      unless idx >= 0 && idx < input_count
        raise InvalidResponseError,
              "Parse::Embeddings::LocalHTTP: response.data entry index #{idx} out of range."
      end
      [idx, entry["embedding"]]
    end
    if sorted.map(&:first).uniq.length != sorted.length
      raise InvalidResponseError,
            "Parse::Embeddings::LocalHTTP: duplicate index in response.data."
    end
    sorted.sort_by(&:first).map(&:last)
  else
    data.each_with_index.map do |entry, i|
      unless entry.is_a?(Hash)
        raise InvalidResponseError,
              "Parse::Embeddings::LocalHTTP: response.data[#{i}] is not a JSON object."
      end
      entry["embedding"]
    end
  end
end

#inspect_attrsObject



228
229
230
# File 'lib/parse/embeddings/local_http.rb', line 228

def inspect_attrs
  super.merge(base: safe_base_host, retries: @max_retries)
end

#model_nameObject



171
172
173
# File 'lib/parse/embeddings/local_http.rb', line 171

def model_name
  @model
end

#normalize?Boolean

Returns:

  • (Boolean)


179
180
181
# File 'lib/parse/embeddings/local_http.rb', line 179

def normalize?
  @normalize
end

#parse_json_body!(body) ⇒ Object (protected)



298
299
300
301
302
303
304
305
306
307
308
309
# File 'lib/parse/embeddings/local_http.rb', line 298

def parse_json_body!(body)
  s = body.to_s
  if s.bytesize > MAX_RESPONSE_BYTES
    raise InvalidResponseError,
          "Parse::Embeddings::LocalHTTP: response body exceeds #{MAX_RESPONSE_BYTES} bytes " \
          "(#{s.bytesize}). Refusing to parse."
  end
  JSON.parse(s, max_nesting: 32)
rescue JSON::ParserError => e
  raise InvalidResponseError,
        "Parse::Embeddings::LocalHTTP: response is not valid JSON (#{e.message})."
end

#post_embeddings(body) ⇒ Object (protected)



254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
# File 'lib/parse/embeddings/local_http.rb', line 254

def post_embeddings(body)
  attempts = 0
  loop do
    attempts += 1
    begin
      response = @connection.post("embeddings") do |req|
        req.body = body.to_json
      end
    rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e
      if attempts > @max_retries
        raise TransientError, "Parse::Embeddings::LocalHTTP: #{e.class} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end

    status = response.status
    return parse_json_body!(response.body) if status >= 200 && status < 300

    if status == 401
      raise AuthenticationError,
            "Parse::Embeddings::LocalHTTP: 401 Unauthorized — check api_key."
    end
    if status == 429
      if attempts > @max_retries
        raise RateLimitError,
              "Parse::Embeddings::LocalHTTP: 429 rate limited after #{attempts} attempt(s)."
      end
      sleep(retry_after_seconds(response) || backoff_seconds(attempts))
      next
    end
    if status >= 500
      if attempts > @max_retries
        raise TransientError,
              "Parse::Embeddings::LocalHTTP: #{status} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end
    raise BadRequestError,
          "Parse::Embeddings::LocalHTTP: #{status} from POST /embeddings."
  end
end

#retry_after_seconds(response) ⇒ Object (protected)



359
360
361
362
363
364
# File 'lib/parse/embeddings/local_http.rb', line 359

def retry_after_seconds(response)
  ra = response.respond_to?(:headers) ? response.headers["retry-after"] || response.headers["Retry-After"] : nil
  return nil unless ra
  v = ra.to_f
  v.positive? ? [v, 60.0].min : nil
end

#supports_input_type?Boolean

Returns:

  • (Boolean)


183
184
185
186
187
188
189
190
# File 'lib/parse/embeddings/local_http.rb', line 183

def supports_input_type?
  # The OpenAI-compatible local runners do not asymmetrize. Some
  # models (bge-*) have a documented query prefix, but the local
  # server itself doesn't expose `input_type:` — callers wrap the
  # query text instead. We accept the kwarg for cache-key stability
  # but drop it at the wire level.
  false
end