Class: Parse::Embeddings::Qwen

Inherits:

Provider

Object
Provider
Parse::Embeddings::Qwen

show all

Defined in:: lib/parse/embeddings/qwen.rb

Overview

Qwen 3 embeddings provider. Targets Alibaba Cloud DashScope's OpenAI-compatible endpoint (/compatible-mode/v1/embeddings), which mirrors the OpenAI request envelope but speaks the qwen3-embedding-* model family.

Supported models — all three are Matryoshka-capable, so the dimensions: constructor kwarg truncates the returned vector to any width ≤ native:

qwen3-embedding-0.6b — 1024 dim native, ~32k input tokens.
qwen3-embedding-4b — 2560 dim native.
qwen3-embedding-8b — 4096 dim native.

The same three checkpoints are published open-weight on Hugging Face under Apache 2.0 (Qwen/Qwen3-Embedding-0.6B, etc.) — for self-hosted inference behind vLLM / Text Embeddings Inference / llama.cpp, use LocalHTTP instead and point it at your gateway.

== Asymmetric input types

Qwen3-Embedding is trained with an instruction-tuned head, but the DashScope compatible-mode endpoint does not currently accept an input_type / task request field. We therefore set supports_input_type? to false and drop the SDK-canonical input_type: kwarg at the wire — same posture as OpenAI and LocalHTTP. Callers who want query/passage asymmetry must wrap their text with an explicit instruction prefix client-side; the AS::N event still carries the requested input_type so cache keys remain stable.

Examples:

registration (DashScope International endpoint)

Parse::Embeddings.register(:qwen,
  Parse::Embeddings::Qwen.new(
    api_key: ENV.fetch("DASHSCOPE_API_KEY"),
    model:   "qwen3-embedding-8b",
  ))

Matryoshka truncation

Parse::Embeddings::Qwen.new(
  api_key: ENV.fetch("DASHSCOPE_API_KEY"),
  model:      "qwen3-embedding-8b",
  dimensions: 1024,  # truncate from 4096 → 1024
)

Defined Under Namespace

Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError

Constant Summary collapse

DEFAULT_BASE_URL = Default to the international compatible-mode host. Operators in mainland China should override to https://dashscope.aliyuncs.com/compatible-mode/v1.

"https://dashscope-intl.aliyuncs.com/compatible-mode/v1"

DEFAULT_MODEL =

"qwen3-embedding-8b"

DEFAULT_TIMEOUT =

DEFAULT_OPEN_TIMEOUT =

DEFAULT_MAX_RETRIES =

DEFAULT_BATCH_SIZE = DashScope's compatible endpoint caps embedding requests at 25 inputs per call (smaller than OpenAI's 2048). Default below the cap so callers don't have to tune.

MAX_RESPONSE_BYTES =

16 * 1024 * 1024

MODEL_DEFAULT_DIMENSIONS =

{
  "qwen3-embedding-0.6b" => 1024,
  "qwen3-embedding-4b"   => 2560,
  "qwen3-embedding-8b"   => 4096,
}.freeze

MODEL_MAX_INPUT_TOKENS =

{
  "qwen3-embedding-0.6b" => 32_000,
  "qwen3-embedding-4b"   => 32_000,
  "qwen3-embedding-8b"   => 32_000,
}.freeze

MATRYOSHKA_MODELS = Every Qwen3-Embedding row is Matryoshka-capable. Kept as an explicit allowlist so future non-Matryoshka additions (e.g. qwen-text-embedding-v3) don't silently inherit the behaviour.

%w[
  qwen3-embedding-0.6b
  qwen3-embedding-4b
  qwen3-embedding-8b
].freeze

Constants inherited from Provider

Provider::AS_NOTIFICATION_NAME

Instance Method Summary collapse

#backoff_seconds(attempt) ⇒ Object protected
#build_connection ⇒ Object protected
#dimensions ⇒ Object
#embed_batch_size ⇒ Object
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Vectors aligned 1:1 with strings.
#extract_vectors!(payload, input_count) ⇒ Object protected
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ Qwen constructor
A new instance of Qwen.
#inspect_attrs ⇒ Object
#max_input_tokens ⇒ Object
#model_name ⇒ Object
#normalize? ⇒ Boolean
#parse_json_body!(body) ⇒ Object protected
#post_embeddings(body) ⇒ Object protected
#retry_after_seconds(response) ⇒ Object protected
#supports_input_type? ⇒ Boolean

Methods inherited from Provider

#embed_image, #embed_text_batched, #inspect, #instrument_embed, #modalities, #validate_response!

Constructor Details

#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ `Qwen`

Returns a new instance of Qwen.

Parameters:

api_key (String) —
required. Sent as Authorization: Bearer ….
model (String) (defaults to: DEFAULT_MODEL) —
one of MODEL_DEFAULT_DIMENSIONS's keys.
dimensions (Integer, nil) (defaults to: nil) —
Matryoshka truncation. Must be ≤ the model's native width.
base_url (String) (defaults to: DEFAULT_BASE_URL) —
override (mainland-China host or a private gateway). Must be HTTPS unless allow_insecure_base_url: true.
timeout (Integer) (defaults to: DEFAULT_TIMEOUT) —
read timeout, seconds.
open_timeout (Integer) (defaults to: DEFAULT_OPEN_TIMEOUT) —
connect timeout, seconds.
max_retries (Integer) (defaults to: DEFAULT_MAX_RETRIES) —
retry attempts on 429/5xx/timeouts.
embed_batch_size (Integer) (defaults to: DEFAULT_BATCH_SIZE) —
inputs per request (DashScope compatible-mode caps at 25).
allow_faraday_proxy (Boolean) (defaults to: false) —
opt in to proxy / env-proxy autodiscovery. Defaults false.
allow_insecure_base_url (Boolean) (defaults to: false) —
permit http:// base.
connection (Faraday::Connection, nil) (defaults to: nil) —
injection seam.

# File 'lib/parse/embeddings/qwen.rb', line 111

def initialize(
  api_key:,
  model: DEFAULT_MODEL,
  dimensions: nil,
  base_url: DEFAULT_BASE_URL,
  timeout: DEFAULT_TIMEOUT,
  open_timeout: DEFAULT_OPEN_TIMEOUT,
  max_retries: DEFAULT_MAX_RETRIES,
  embed_batch_size: DEFAULT_BATCH_SIZE,
  allow_faraday_proxy: false,
  allow_insecure_base_url: false,
  connection: nil
)
  validate_api_key!(api_key)
  validate_model!(model)
  validate_dimensions!(model, dimensions)
  sanitized_base_url = validate_base_url!(base_url, allow_insecure_base_url)
  validate_positive_integer!(:timeout, timeout)
  validate_positive_integer!(:open_timeout, open_timeout)
  validate_non_negative_integer!(:max_retries, max_retries)
  validate_positive_integer!(:embed_batch_size, embed_batch_size)

  @api_key = api_key
  @model = model
  @dimensions = dimensions || MODEL_DEFAULT_DIMENSIONS.fetch(model)
  @base_url = sanitized_base_url
  @timeout = timeout
  @open_timeout = open_timeout
  @max_retries = max_retries
  @embed_batch_size = embed_batch_size
  @allow_faraday_proxy = allow_faraday_proxy
  @connection = connection || build_connection
end

Instance Method Details

#backoff_seconds(attempt) ⇒ `Object` (protected)



331
332
333

# File 'lib/parse/embeddings/qwen.rb', line 331

def backoff_seconds(attempt)
  [0.5 * (2**(attempt - 1)), 30.0].min
end

#build_connection ⇒ `Object` (protected)

# File 'lib/parse/embeddings/qwen.rb', line 225

def build_connection
  headers = {
    "Authorization" => "Bearer #{@api_key}",
    "Content-Type" => "application/json",
    "Accept" => "application/json",
    "User-Agent" => "parse-stack-embeddings/#{user_agent_version}",
  }

  faraday_opts = { url: @base_url, headers: headers }
  faraday_opts[:proxy] = nil unless @allow_faraday_proxy

  conn = Faraday.new(**faraday_opts) do |f|
    f.options.timeout = @timeout
    f.options.open_timeout = @open_timeout
    f.adapter Faraday.default_adapter
  end
  conn.proxy = nil if !@allow_faraday_proxy && conn.respond_to?(:proxy=)
  conn
end

#dimensions ⇒ `Object`



145
146
147

# File 'lib/parse/embeddings/qwen.rb', line 145

def dimensions
  @dimensions
end

#embed_batch_size ⇒ `Object`



153
154
155

# File 'lib/parse/embeddings/qwen.rb', line 153

def embed_batch_size
  @embed_batch_size
end

#embed_text(strings, input_type: :search_document) ⇒ `Array<Array<Float>>`

Returns vectors aligned 1:1 with strings.

Parameters:

strings (Array<String>) —
inputs.
input_type (Symbol) (defaults to: :search_document) —
accepted for forward compatibility, dropped at the wire (see #supports_input_type?).

Returns:

(Array<Array<Float>>) —
vectors aligned 1:1 with strings.

# File 'lib/parse/embeddings/qwen.rb', line 177

def embed_text(strings, input_type: :search_document)
  unless strings.is_a?(Array)
    raise ArgumentError,
          "Parse::Embeddings::Qwen#embed_text expects Array<String> (got #{strings.class})."
  end
  return [] if strings.empty?
  strings.each_with_index do |s, i|
    unless s.is_a?(String)
      raise ArgumentError,
            "Parse::Embeddings::Qwen#embed_text strings[#{i}] is not a String (#{s.class})."
    end
    if s.empty?
      raise ArgumentError,
            "Parse::Embeddings::Qwen#embed_text strings[#{i}] is empty; Qwen rejects empty inputs."
    end
  end

  body = {
    model: @model,
    input: strings,
    encoding_format: "float",
  }
  # Forward `dimensions` only when active width differs from
  # native. Sending native width is a no-op on DashScope but
  # we keep the wire minimal to avoid drift across future
  # endpoint revisions.
  if MATRYOSHKA_MODELS.include?(@model) &&
     @dimensions != MODEL_DEFAULT_DIMENSIONS.fetch(@model)
    body[:dimensions] = @dimensions
  end

  instrument_embed(strings.length, input_type) do |emit_payload|
    payload = post_embeddings(body)
    if payload.is_a?(Hash) && payload["usage"].is_a?(Hash)
      tt = payload["usage"]["total_tokens"]
      emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0
    end
    vectors = extract_vectors!(payload, strings.length)
    validate_response!(strings.length, vectors)
  end
end

#extract_vectors!(payload, input_count) ⇒ `Object` (protected)

# File 'lib/parse/embeddings/qwen.rb', line 298

def extract_vectors!(payload, input_count)
  unless payload.is_a?(Hash)
    raise InvalidResponseError,
          "Parse::Embeddings::Qwen: response body is not a JSON object."
  end
  data = payload["data"]
  unless data.is_a?(Array)
    raise InvalidResponseError,
          "Parse::Embeddings::Qwen: response.data is not an Array."
  end
  if data.length != input_count
    raise InvalidResponseError,
          "Parse::Embeddings::Qwen: response.data.length #{data.length} != input count #{input_count}."
  end
  sorted = data.each_with_index.map do |entry, i|
    unless entry.is_a?(Hash)
      raise InvalidResponseError,
            "Parse::Embeddings::Qwen: response.data[#{i}] is not a JSON object."
    end
    idx = entry["index"]
    unless idx.is_a?(Integer) && idx >= 0 && idx < input_count
      raise InvalidResponseError,
            "Parse::Embeddings::Qwen: response.data[#{i}].index #{idx.inspect} out of range."
    end
    [idx, entry["embedding"]]
  end
  indices = sorted.map(&:first)
  if indices.uniq.length != indices.length
    raise InvalidResponseError, "Parse::Embeddings::Qwen: duplicate index in response.data."
  end
  sorted.sort_by(&:first).map(&:last)
end

#inspect_attrs ⇒ `Object`



219
220
221

# File 'lib/parse/embeddings/qwen.rb', line 219

def inspect_attrs
  super.merge(base: safe_base_host, retries: @max_retries)
end

#max_input_tokens ⇒ `Object`



157
158
159

# File 'lib/parse/embeddings/qwen.rb', line 157

def max_input_tokens
  MODEL_MAX_INPUT_TOKENS[@model]
end

#model_name ⇒ `Object`



149
150
151

# File 'lib/parse/embeddings/qwen.rb', line 149

def model_name
  @model
end

#normalize? ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/parse/embeddings/qwen.rb', line 161

def normalize?
  # Qwen3-Embedding is documented unit-normalized at the head.
  true
end

#parse_json_body!(body) ⇒ `Object` (protected)

# File 'lib/parse/embeddings/qwen.rb', line 285

def parse_json_body!(body)
  s = body.to_s
  if s.bytesize > MAX_RESPONSE_BYTES
    raise InvalidResponseError,
          "Parse::Embeddings::Qwen: response body exceeds #{MAX_RESPONSE_BYTES} bytes " \
          "(#{s.bytesize}). Refusing to parse."
  end
  JSON.parse(s, max_nesting: 32)
rescue JSON::ParserError => e
  raise InvalidResponseError,
        "Parse::Embeddings::Qwen: response is not valid JSON (#{e.message})."
end

#post_embeddings(body) ⇒ `Object` (protected)

# File 'lib/parse/embeddings/qwen.rb', line 245

def post_embeddings(body)
  attempts = 0
  loop do
    attempts += 1
    begin
      response = @connection.post("embeddings") do |req|
        req.body = body.to_json
      end
    rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e
      if attempts > @max_retries
        raise TransientError, "Parse::Embeddings::Qwen: #{e.class} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end

    status = response.status
    return parse_json_body!(response.body) if status >= 200 && status < 300

    if status == 401
      raise AuthenticationError, "Parse::Embeddings::Qwen: 401 Unauthorized — check api_key."
    end
    if status == 429
      if attempts > @max_retries
        raise RateLimitError, "Parse::Embeddings::Qwen: 429 rate limited after #{attempts} attempt(s)."
      end
      sleep(retry_after_seconds(response) || backoff_seconds(attempts))
      next
    end
    if status >= 500
      if attempts > @max_retries
        raise TransientError, "Parse::Embeddings::Qwen: #{status} after #{attempts} attempt(s)."
      end
      sleep(backoff_seconds(attempts))
      next
    end
    raise BadRequestError, "Parse::Embeddings::Qwen: #{status} from POST /embeddings."
  end
end

#retry_after_seconds(response) ⇒ `Object` (protected)

# File 'lib/parse/embeddings/qwen.rb', line 335

def retry_after_seconds(response)
  ra = response.respond_to?(:headers) ? response.headers["retry-after"] || response.headers["Retry-After"] : nil
  return nil unless ra
  v = ra.to_f
  v.positive? ? [v, 60.0].min : nil
end

#supports_input_type? ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/parse/embeddings/qwen.rb', line 166

def supports_input_type?
  # DashScope compatible-mode does not accept a wire-level
  # input_type / task field. The kwarg threads through for
  # cache-key stability but is dropped at the request.
  false
end

Class: Parse::Embeddings::Qwen

Overview

Examples:

registration (DashScope International endpoint)

Matryoshka truncation

Defined Under Namespace

Constant Summary collapse

Constants inherited from Provider

Instance Method Summary collapse

Methods inherited from Provider

Constructor Details

Instance Method Details

#backoff_seconds(attempt) ⇒ Object (protected)

#build_connection ⇒ Object (protected)

#dimensions ⇒ Object

#embed_batch_size ⇒ Object

#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>

#extract_vectors!(payload, input_count) ⇒ Object (protected)

#inspect_attrs ⇒ Object

#max_input_tokens ⇒ Object

#model_name ⇒ Object

#normalize? ⇒ Boolean

#parse_json_body!(body) ⇒ Object (protected)

#post_embeddings(body) ⇒ Object (protected)

#retry_after_seconds(response) ⇒ Object (protected)

#supports_input_type? ⇒ Boolean

#backoff_seconds(attempt) ⇒ `Object` (protected)

#build_connection ⇒ `Object` (protected)

#dimensions ⇒ `Object`

#embed_batch_size ⇒ `Object`

#embed_text(strings, input_type: :search_document) ⇒ `Array<Array<Float>>`

#extract_vectors!(payload, input_count) ⇒ `Object` (protected)

#inspect_attrs ⇒ `Object`

#max_input_tokens ⇒ `Object`

#model_name ⇒ `Object`

#normalize? ⇒ `Boolean`

#parse_json_body!(body) ⇒ `Object` (protected)

#post_embeddings(body) ⇒ `Object` (protected)

#retry_after_seconds(response) ⇒ `Object` (protected)

#supports_input_type? ⇒ `Boolean`