Class: Parse::Embeddings::Qwen

Inherits:
Provider
  • Object
show all
Defined in:
lib/parse/embeddings/qwen.rb

Overview

Qwen 3 embeddings provider. Targets Alibaba Cloud DashScope's OpenAI-compatible endpoint (/compatible-mode/v1/embeddings), which mirrors the OpenAI request envelope but speaks the qwen3-embedding-* model family.

Supported models — all three are Matryoshka-capable, so the dimensions: constructor kwarg truncates the returned vector to any width ≤ native:

  • qwen3-embedding-0.6b — 1024 dim native, ~32k input tokens.
  • qwen3-embedding-4b — 2560 dim native.
  • qwen3-embedding-8b — 4096 dim native.

The same three checkpoints are published open-weight on Hugging Face under Apache 2.0 (Qwen/Qwen3-Embedding-0.6B, etc.) — for self-hosted inference behind vLLM / Text Embeddings Inference / llama.cpp, use LocalHTTP instead and point it at your gateway.

== Asymmetric input types

Qwen3-Embedding is trained with an instruction-tuned head, but the DashScope compatible-mode endpoint does not currently accept an input_type / task request field. We therefore set supports_input_type? to false and drop the SDK-canonical input_type: kwarg at the wire — same posture as OpenAI and LocalHTTP. Callers who want query/passage asymmetry must wrap their text with an explicit instruction prefix client-side; the AS::N event still carries the requested input_type so cache keys remain stable.

Examples:

registration (DashScope International endpoint)

Parse::Embeddings.register(:qwen,
  Parse::Embeddings::Qwen.new(
    api_key: ENV.fetch("DASHSCOPE_API_KEY"),
    model:   "qwen3-embedding-8b",
  ))

Matryoshka truncation

Parse::Embeddings::Qwen.new(
  api_key: ENV.fetch("DASHSCOPE_API_KEY"),
  model:      "qwen3-embedding-8b",
  dimensions: 1024,  # truncate from 4096 → 1024
)

Defined Under Namespace

Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError

Constant Summary collapse

DEFAULT_BASE_URL =

Default to the international compatible-mode host. Operators in mainland China should override to https://dashscope.aliyuncs.com/compatible-mode/v1.

"https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
DEFAULT_MODEL =
"qwen3-embedding-8b"
DEFAULT_TIMEOUT =
30
DEFAULT_OPEN_TIMEOUT =
5
DEFAULT_MAX_RETRIES =
3
DEFAULT_BATCH_SIZE =

DashScope's compatible endpoint caps embedding requests at 25 inputs per call (smaller than OpenAI's 2048). Default below the cap so callers don't have to tune.

10
MAX_RESPONSE_BYTES =
16 * 1024 * 1024
MODEL_DEFAULT_DIMENSIONS =
{
  "qwen3-embedding-0.6b" => 1024,
  "qwen3-embedding-4b"   => 2560,
  "qwen3-embedding-8b"   => 4096,
}.freeze
MODEL_MAX_INPUT_TOKENS =
{
  "qwen3-embedding-0.6b" => 32_000,
  "qwen3-embedding-4b"   => 32_000,
  "qwen3-embedding-8b"   => 32_000,
}.freeze
MATRYOSHKA_MODELS =

Every Qwen3-Embedding row is Matryoshka-capable. Kept as an explicit allowlist so future non-Matryoshka additions (e.g. qwen-text-embedding-v3) don't silently inherit the behaviour.

%w[
  qwen3-embedding-0.6b
  qwen3-embedding-4b
  qwen3-embedding-8b
].freeze

Constants inherited from Provider

Provider::AS_NOTIFICATION_NAME

Instance Method Summary collapse

Methods inherited from Provider

#embed_image, #embed_text_batched, #inspect, #instrument_embed, #modalities, #validate_response!

Constructor Details

#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ Qwen

Returns a new instance of Qwen.

Parameters:

  • api_key (String)

    required. Sent as Authorization: Bearer ….

  • model (String) (defaults to: DEFAULT_MODEL)

    one of MODEL_DEFAULT_DIMENSIONS's keys.

  • dimensions (Integer, nil) (defaults to: nil)

    Matryoshka truncation. Must be ≤ the model's native width.

  • base_url (String) (defaults to: DEFAULT_BASE_URL)

    override (mainland-China host or a private gateway). Must be HTTPS unless allow_insecure_base_url: true.

  • timeout (Integer) (defaults to: DEFAULT_TIMEOUT)

    read timeout, seconds.

  • open_timeout (Integer) (defaults to: DEFAULT_OPEN_TIMEOUT)

    connect timeout, seconds.

  • max_retries (Integer) (defaults to: DEFAULT_MAX_RETRIES)

    retry attempts on 429/5xx/timeouts.

  • embed_batch_size (Integer) (defaults to: DEFAULT_BATCH_SIZE)

    inputs per request (DashScope compatible-mode caps at 25).

  • allow_faraday_proxy (Boolean) (defaults to: false)

    opt in to proxy / env-proxy autodiscovery. Defaults false.

  • allow_insecure_base_url (Boolean) (defaults to: false)

    permit http:// base.

  • connection (Faraday::Connection, nil) (defaults to: nil)

    injection seam.



111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# File 'lib/parse/embeddings/qwen.rb', line 111

def initialize(
  api_key:,
  model: DEFAULT_MODEL,
  dimensions: nil,
  base_url: DEFAULT_BASE_URL,
  timeout: DEFAULT_TIMEOUT,
  open_timeout: DEFAULT_OPEN_TIMEOUT,
  max_retries: DEFAULT_MAX_RETRIES,
  embed_batch_size: DEFAULT_BATCH_SIZE,
  allow_faraday_proxy: false,
  allow_insecure_base_url: false,
  connection: nil
)
  validate_api_key!(api_key)
  validate_model!(model)
  validate_dimensions!(model, dimensions)
  sanitized_base_url = validate_base_url!(base_url, allow_insecure_base_url)
  validate_positive_integer!(:timeout, timeout)
  validate_positive_integer!(:open_timeout, open_timeout)
  validate_non_negative_integer!(:max_retries, max_retries)
  validate_positive_integer!(:embed_batch_size, embed_batch_size)

  @api_key = api_key
  @model = model
  @dimensions = dimensions || MODEL_DEFAULT_DIMENSIONS.fetch(model)
  @base_url = sanitized_base_url
  @timeout = timeout
  @open_timeout = open_timeout
  @max_retries = max_retries
  @embed_batch_size = embed_batch_size
  @allow_faraday_proxy = allow_faraday_proxy
  @connection = connection || build_connection
end

Instance Method Details

#backoff_seconds(attempt) ⇒ Object (protected)



331
332
333
# File 'lib/parse/embeddings/qwen.rb', line 331

def backoff_seconds(attempt)
  [0.5 * (2**(attempt - 1)), 30.0].min
end

#build_connectionObject (protected)



225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
# File 'lib/parse/embeddings/qwen.rb', line 225

def build_connection
  headers = {
    "Authorization" => "Bearer #{@api_key}",
    "Content-Type" => "application/json",
    "Accept" => "application/json",
    "User-Agent" => "parse-stack-embeddings/#{user_agent_version}",
  }

  faraday_opts = { url: @base_url, headers: headers }
  faraday_opts[:proxy] = nil unless @allow_faraday_proxy

  conn = Faraday.new(**faraday_opts) do |f|
    f.options.timeout = @timeout
    f.options.open_timeout = @open_timeout
    f.adapter Faraday.default_adapter
  end
  conn.proxy = nil if !@allow_faraday_proxy && conn.respond_to?(:proxy=)
  conn
end

#dimensionsObject



145
146
147
# File 'lib/parse/embeddings/qwen.rb', line 145

def dimensions
  @dimensions
end

#embed_batch_sizeObject



153
154
155
# File 'lib/parse/embeddings/qwen.rb', line 153

def embed_batch_size
  @embed_batch_size
end

#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>

Returns vectors aligned 1:1 with strings.

Parameters:

Returns:

  • (Array<Array<Float>>)

    vectors aligned 1:1 with strings.



177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
# File 'lib/parse/embeddings/qwen.rb', line 177

def embed_text(strings, input_type: :search_document)
  unless strings.is_a?(Array)
    raise ArgumentError,
          "Parse::Embeddings::Qwen#embed_text expects Array<String> (got #{strings.class})."
  end
  return [] if strings.empty?
  strings.each_with_index do |s, i|
    unless s.is_a?(String)
      raise ArgumentError,
            "Parse::Embeddings::Qwen#embed_text strings[#{i}] is not a String (#{s.class})."
    end
    if s.empty?
      raise ArgumentError,
            "Parse::Embeddings::Qwen#embed_text strings[#{i}] is empty; Qwen rejects empty inputs."
    end
  end

  body = {
    model: @model,
    input: strings,
    encoding_format: "float",
  }
  # Forward `dimensions` only when active width differs from
  # native. Sending native width is a no-op on DashScope but
  # we keep the wire minimal to avoid drift across future
  # endpoint revisions.
  if MATRYOSHKA_MODELS.include?(@model) &&
     @dimensions != MODEL_DEFAULT_DIMENSIONS.fetch(@model)
    body[:dimensions] = @dimensions
  end

  instrument_embed(strings.length, input_type) do |emit_payload|
    payload = post_embeddings(body)
    if payload.is_a?(Hash) && payload["usage"].is_a?(Hash)
      tt = payload["usage"]["total_tokens"]
      emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0
    end
    vectors = extract_vectors!(payload, strings.length)
    validate_response!(strings.length, vectors)
  end
end

#extract_vectors!(payload, input_count) ⇒ Object (protected)



298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
# File 'lib/parse/embeddings/qwen.rb', line 298

def extract_vectors!(payload, input_count)
  unless payload.is_a?(Hash)
    raise InvalidResponseError,
          "Parse::Embeddings::Qwen: response body is not a JSON object."
  end
  data = payload["data"]
  unless data.is_a?(Array)
    raise InvalidResponseError,
          "Parse::Embeddings::Qwen: response.data is not an Array."
  end
  if data.length != input_count
    raise InvalidResponseError,
          "Parse::Embeddings::Qwen: response.data.length #{data.length} != input count #{input_count}."
  end
  sorted = data.each_with_index.map do |entry, i|
    unless entry.is_a?(Hash)
      raise InvalidResponseError,
            "Parse::Embeddings::Qwen: response.data[#{i}] is not a JSON object."
    end
    idx = entry["index"]
    unless idx.is_a?(Integer) && idx >= 0 && idx < input_count
      raise InvalidResponseError,
            "Parse::Embeddings::Qwen: response.data[#{i}].index #{idx.inspect} out of range."
    end
    [idx, entry["embedding"]]
  end
  indices = sorted.map(&:first)
  if indices.uniq.length != indices.length
    raise InvalidResponseError, "Parse::Embeddings::Qwen: duplicate index in response.data."
  end
  sorted.sort_by(&:first).map(&:last)
end

#inspect_attrsObject



219
220
221
# File 'lib/parse/embeddings/qwen.rb', line 219

def inspect_attrs
  super.merge(base: safe_base_host, retries: @max_retries)
end

#max_input_tokensObject



157
158
159
# File 'lib/parse/embeddings/qwen.rb', line 157

def max_input_tokens
  MODEL_MAX_INPUT_TOKENS[@model]
end

#model_nameObject



149
150
151
# File 'lib/parse/embeddings/qwen.rb', line 149

def model_name
  @model
end

#normalize?Boolean

Returns:

  • (Boolean)


161
162
163
164
# File 'lib/parse/embeddings/qwen.rb', line 161

def normalize?
  # Qwen3-Embedding is documented unit-normalized at the head.
  true
end

#parse_json_body!(body) ⇒ Object (protected)



285
286
287
288
289
290
291
292
293
294
295
296
# File 'lib/parse/embeddings/qwen.rb', line 285

def parse_json_body!(body)
  s = body.to_s
  if s.bytesize > MAX_RESPONSE_BYTES
    raise InvalidResponseError,
          "Parse::Embeddings::Qwen: response body exceeds #{MAX_RESPONSE_BYTES} bytes " \
          "(#{s.bytesize}). Refusing to parse."
  end
  JSON.parse(s, max_nesting: 32)
rescue JSON::ParserError => e
  raise InvalidResponseError,
        "Parse::Embeddings::Qwen: response is not valid JSON (#{e.message})."
end

#post_embeddings(body) ⇒ Object (protected)



245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
# File 'lib/parse/embeddings/qwen.rb', line 245

def post_embeddings(body)
  attempts = 0
  loop do
    attempts += 1
    begin
      response = @connection.post("embeddings") do |req|
        req.body = body.to_json
      end
    rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e
      if attempts > @max_retries
        raise TransientError, "Parse::Embeddings::Qwen: #{e.class} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end

    status = response.status
    return parse_json_body!(response.body) if status >= 200 && status < 300

    if status == 401
      raise AuthenticationError, "Parse::Embeddings::Qwen: 401 Unauthorized — check api_key."
    end
    if status == 429
      if attempts > @max_retries
        raise RateLimitError, "Parse::Embeddings::Qwen: 429 rate limited after #{attempts} attempt(s)."
      end
      sleep(retry_after_seconds(response) || backoff_seconds(attempts))
      next
    end
    if status >= 500
      if attempts > @max_retries
        raise TransientError, "Parse::Embeddings::Qwen: #{status} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end
    raise BadRequestError, "Parse::Embeddings::Qwen: #{status} from POST /embeddings."
  end
end

#retry_after_seconds(response) ⇒ Object (protected)



335
336
337
338
339
340
# File 'lib/parse/embeddings/qwen.rb', line 335

def retry_after_seconds(response)
  ra = response.respond_to?(:headers) ? response.headers["retry-after"] || response.headers["Retry-After"] : nil
  return nil unless ra
  v = ra.to_f
  v.positive? ? [v, 60.0].min : nil
end

#supports_input_type?Boolean

Returns:

  • (Boolean)


166
167
168
169
170
171
# File 'lib/parse/embeddings/qwen.rb', line 166

def supports_input_type?
  # DashScope compatible-mode does not accept a wire-level
  # input_type / task field. The kwarg threads through for
  # cache-key stability but is dropped at the request.
  false
end