Class: Parse::Embeddings::Qwen
- Defined in:
- lib/parse/embeddings/qwen.rb
Overview
Qwen 3 embeddings provider. Targets Alibaba Cloud DashScope's
OpenAI-compatible endpoint (/compatible-mode/v1/embeddings),
which mirrors the OpenAI request envelope but speaks the
qwen3-embedding-* model family.
Supported models — all three are Matryoshka-capable, so the
dimensions: constructor kwarg truncates the returned vector
to any width ≤ native:
qwen3-embedding-0.6b— 1024 dim native, ~32k input tokens.qwen3-embedding-4b— 2560 dim native.qwen3-embedding-8b— 4096 dim native.
The same three checkpoints are published open-weight on Hugging
Face under Apache 2.0 (Qwen/Qwen3-Embedding-0.6B, etc.) — for
self-hosted inference behind vLLM / Text Embeddings Inference /
llama.cpp, use LocalHTTP instead and point it at your gateway.
== Asymmetric input types
Qwen3-Embedding is trained with an instruction-tuned head, but
the DashScope compatible-mode endpoint does not currently accept
an input_type / task request field. We therefore set
supports_input_type? to false and drop the SDK-canonical
input_type: kwarg at the wire — same posture as OpenAI and
LocalHTTP. Callers who want query/passage asymmetry must wrap
their text with an explicit instruction prefix client-side; the
AS::N event still carries the requested input_type so cache
keys remain stable.
Defined Under Namespace
Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError
Constant Summary collapse
- DEFAULT_BASE_URL =
Default to the international compatible-mode host. Operators in mainland China should override to
https://dashscope.aliyuncs.com/compatible-mode/v1. "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"- DEFAULT_MODEL =
"qwen3-embedding-8b"- DEFAULT_TIMEOUT =
30- DEFAULT_OPEN_TIMEOUT =
5- DEFAULT_MAX_RETRIES =
3- DEFAULT_BATCH_SIZE =
DashScope's compatible endpoint caps embedding requests at 25 inputs per call (smaller than OpenAI's 2048). Default below the cap so callers don't have to tune.
10- MAX_RESPONSE_BYTES =
16 * 1024 * 1024
- MODEL_DEFAULT_DIMENSIONS =
{ "qwen3-embedding-0.6b" => 1024, "qwen3-embedding-4b" => 2560, "qwen3-embedding-8b" => 4096, }.freeze
- MODEL_MAX_INPUT_TOKENS =
{ "qwen3-embedding-0.6b" => 32_000, "qwen3-embedding-4b" => 32_000, "qwen3-embedding-8b" => 32_000, }.freeze
- MATRYOSHKA_MODELS =
Every Qwen3-Embedding row is Matryoshka-capable. Kept as an explicit allowlist so future non-Matryoshka additions (e.g. qwen-text-embedding-v3) don't silently inherit the behaviour.
%w[ qwen3-embedding-0.6b qwen3-embedding-4b qwen3-embedding-8b ].freeze
Constants inherited from Provider
Provider::AS_NOTIFICATION_NAME
Instance Method Summary collapse
- #backoff_seconds(attempt) ⇒ Object protected
- #build_connection ⇒ Object protected
- #dimensions ⇒ Object
- #embed_batch_size ⇒ Object
-
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Vectors aligned 1:1 with
strings. - #extract_vectors!(payload, input_count) ⇒ Object protected
-
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ Qwen
constructor
A new instance of Qwen.
- #inspect_attrs ⇒ Object
- #max_input_tokens ⇒ Object
- #model_name ⇒ Object
- #normalize? ⇒ Boolean
- #parse_json_body!(body) ⇒ Object protected
- #post_embeddings(body) ⇒ Object protected
- #retry_after_seconds(response) ⇒ Object protected
- #supports_input_type? ⇒ Boolean
Methods inherited from Provider
#embed_image, #embed_text_batched, #inspect, #instrument_embed, #modalities, #validate_response!
Constructor Details
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ Qwen
Returns a new instance of Qwen.
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'lib/parse/embeddings/qwen.rb', line 111 def initialize( api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil ) validate_api_key!(api_key) validate_model!(model) validate_dimensions!(model, dimensions) sanitized_base_url = validate_base_url!(base_url, allow_insecure_base_url) validate_positive_integer!(:timeout, timeout) validate_positive_integer!(:open_timeout, open_timeout) validate_non_negative_integer!(:max_retries, max_retries) validate_positive_integer!(:embed_batch_size, ) @api_key = api_key @model = model @dimensions = dimensions || MODEL_DEFAULT_DIMENSIONS.fetch(model) @base_url = sanitized_base_url @timeout = timeout @open_timeout = open_timeout @max_retries = max_retries @embed_batch_size = @allow_faraday_proxy = allow_faraday_proxy @connection = connection || build_connection end |
Instance Method Details
#backoff_seconds(attempt) ⇒ Object (protected)
331 332 333 |
# File 'lib/parse/embeddings/qwen.rb', line 331 def backoff_seconds(attempt) [0.5 * (2**(attempt - 1)), 30.0].min end |
#build_connection ⇒ Object (protected)
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
# File 'lib/parse/embeddings/qwen.rb', line 225 def build_connection headers = { "Authorization" => "Bearer #{@api_key}", "Content-Type" => "application/json", "Accept" => "application/json", "User-Agent" => "parse-stack-embeddings/#{user_agent_version}", } faraday_opts = { url: @base_url, headers: headers } faraday_opts[:proxy] = nil unless @allow_faraday_proxy conn = Faraday.new(**faraday_opts) do |f| f..timeout = @timeout f..open_timeout = @open_timeout f.adapter Faraday.default_adapter end conn.proxy = nil if !@allow_faraday_proxy && conn.respond_to?(:proxy=) conn end |
#dimensions ⇒ Object
145 146 147 |
# File 'lib/parse/embeddings/qwen.rb', line 145 def dimensions @dimensions end |
#embed_batch_size ⇒ Object
153 154 155 |
# File 'lib/parse/embeddings/qwen.rb', line 153 def @embed_batch_size end |
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Returns vectors aligned 1:1 with strings.
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
# File 'lib/parse/embeddings/qwen.rb', line 177 def (strings, input_type: :search_document) unless strings.is_a?(Array) raise ArgumentError, "Parse::Embeddings::Qwen#embed_text expects Array<String> (got #{strings.class})." end return [] if strings.empty? strings.each_with_index do |s, i| unless s.is_a?(String) raise ArgumentError, "Parse::Embeddings::Qwen#embed_text strings[#{i}] is not a String (#{s.class})." end if s.empty? raise ArgumentError, "Parse::Embeddings::Qwen#embed_text strings[#{i}] is empty; Qwen rejects empty inputs." end end body = { model: @model, input: strings, encoding_format: "float", } # Forward `dimensions` only when active width differs from # native. Sending native width is a no-op on DashScope but # we keep the wire minimal to avoid drift across future # endpoint revisions. if MATRYOSHKA_MODELS.include?(@model) && @dimensions != MODEL_DEFAULT_DIMENSIONS.fetch(@model) body[:dimensions] = @dimensions end (strings.length, input_type) do |emit_payload| payload = (body) if payload.is_a?(Hash) && payload["usage"].is_a?(Hash) tt = payload["usage"]["total_tokens"] emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0 end vectors = extract_vectors!(payload, strings.length) validate_response!(strings.length, vectors) end end |
#extract_vectors!(payload, input_count) ⇒ Object (protected)
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 |
# File 'lib/parse/embeddings/qwen.rb', line 298 def extract_vectors!(payload, input_count) unless payload.is_a?(Hash) raise InvalidResponseError, "Parse::Embeddings::Qwen: response body is not a JSON object." end data = payload["data"] unless data.is_a?(Array) raise InvalidResponseError, "Parse::Embeddings::Qwen: response.data is not an Array." end if data.length != input_count raise InvalidResponseError, "Parse::Embeddings::Qwen: response.data.length #{data.length} != input count #{input_count}." end sorted = data.each_with_index.map do |entry, i| unless entry.is_a?(Hash) raise InvalidResponseError, "Parse::Embeddings::Qwen: response.data[#{i}] is not a JSON object." end idx = entry["index"] unless idx.is_a?(Integer) && idx >= 0 && idx < input_count raise InvalidResponseError, "Parse::Embeddings::Qwen: response.data[#{i}].index #{idx.inspect} out of range." end [idx, entry["embedding"]] end indices = sorted.map(&:first) if indices.uniq.length != indices.length raise InvalidResponseError, "Parse::Embeddings::Qwen: duplicate index in response.data." end sorted.sort_by(&:first).map(&:last) end |
#inspect_attrs ⇒ Object
219 220 221 |
# File 'lib/parse/embeddings/qwen.rb', line 219 def inspect_attrs super.merge(base: safe_base_host, retries: @max_retries) end |
#max_input_tokens ⇒ Object
157 158 159 |
# File 'lib/parse/embeddings/qwen.rb', line 157 def max_input_tokens MODEL_MAX_INPUT_TOKENS[@model] end |
#model_name ⇒ Object
149 150 151 |
# File 'lib/parse/embeddings/qwen.rb', line 149 def model_name @model end |
#normalize? ⇒ Boolean
161 162 163 164 |
# File 'lib/parse/embeddings/qwen.rb', line 161 def normalize? # Qwen3-Embedding is documented unit-normalized at the head. true end |
#parse_json_body!(body) ⇒ Object (protected)
285 286 287 288 289 290 291 292 293 294 295 296 |
# File 'lib/parse/embeddings/qwen.rb', line 285 def parse_json_body!(body) s = body.to_s if s.bytesize > MAX_RESPONSE_BYTES raise InvalidResponseError, "Parse::Embeddings::Qwen: response body exceeds #{MAX_RESPONSE_BYTES} bytes " \ "(#{s.bytesize}). Refusing to parse." end JSON.parse(s, max_nesting: 32) rescue JSON::ParserError => e raise InvalidResponseError, "Parse::Embeddings::Qwen: response is not valid JSON (#{e.})." end |
#post_embeddings(body) ⇒ Object (protected)
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 |
# File 'lib/parse/embeddings/qwen.rb', line 245 def (body) attempts = 0 loop do attempts += 1 begin response = @connection.post("embeddings") do |req| req.body = body.to_json end rescue Faraday::TimeoutError, Faraday::ConnectionFailed => e if attempts > @max_retries raise TransientError, "Parse::Embeddings::Qwen: #{e.class} after #{attempts} attempt(s)." end sleep(backoff_seconds(attempts)) next end status = response.status return parse_json_body!(response.body) if status >= 200 && status < 300 if status == 401 raise AuthenticationError, "Parse::Embeddings::Qwen: 401 Unauthorized — check api_key." end if status == 429 if attempts > @max_retries raise RateLimitError, "Parse::Embeddings::Qwen: 429 rate limited after #{attempts} attempt(s)." end sleep(retry_after_seconds(response) || backoff_seconds(attempts)) next end if status >= 500 if attempts > @max_retries raise TransientError, "Parse::Embeddings::Qwen: #{status} after #{attempts} attempt(s)." end sleep(backoff_seconds(attempts)) next end raise BadRequestError, "Parse::Embeddings::Qwen: #{status} from POST /embeddings." end end |
#retry_after_seconds(response) ⇒ Object (protected)
335 336 337 338 339 340 |
# File 'lib/parse/embeddings/qwen.rb', line 335 def retry_after_seconds(response) ra = response.respond_to?(:headers) ? response.headers["retry-after"] || response.headers["Retry-After"] : nil return nil unless ra v = ra.to_f v.positive? ? [v, 60.0].min : nil end |
#supports_input_type? ⇒ Boolean
166 167 168 169 170 171 |
# File 'lib/parse/embeddings/qwen.rb', line 166 def supports_input_type? # DashScope compatible-mode does not accept a wire-level # input_type / task field. The kwarg threads through for # cache-key stability but is dropped at the request. false end |