Module: Rubino::LLM::ErrorClassifier

Defined in:
lib/rubino/llm/error_classifier.rb

Overview

Centralized API-error classifier — the single source of truth for “is this error worth a retry?”, replacing the adapter’s boolean transient_error?. Port of the reference classify_api_error, reduced to the structural signals ruby_llm actually surfaces: a typed error class and the wrapped HTTP status. We do NOT port the giant message-pattern tables (billing/rate-limit/context phrase lists) — ruby_llm raises typed classes, so status + class carry the same information without the brittle matching. The one message-based branch kept is the MiniMax “unknown error” (code 999/1000) blip, which arrives statusless and must stay in the retryable ‘unknown` bucket.

Constant Summary collapse

STREAM_DROP_ERRORS =

Transport-level drops that surface mid-request and never reach an HTTP status — always retryable. faraday-net_http re-raises IOError/EOFError (and friends) as Faraday::ConnectionFailed, the type we actually see for an upstream socket close; the rest are defensive.

[
  Faraday::ConnectionFailed, Faraday::TimeoutError,
  Net::OpenTimeout, Net::ReadTimeout,
  EOFError, IOError, Errno::ECONNRESET, Errno::EPIPE
].freeze
RETRYABLE_HTTP =

ruby_llm 1.15 raises a typed error per HTTP status. Map the classes we can name directly; everything else falls through to status-based then unknown classification.

->(status) { status && (status >= 500 || status == 429) }.freeze
UNKNOWN_PROVIDER_ERROR_PATTERNS =

Body/message fragments identifying a transient provider “unknown error” (MiniMax api_error 999/1000 on the Anthropic-compatible endpoint). Kept narrow and provider-blip-specific. Moved here from the adapter so the classifier is the single source of truth (folds Slice 0(b)).

[
  "unknown error",
  "api_error 999",
  "api_error 1000",
  "\"code\":999",
  "\"code\": 999",
  "\"code\":1000",
  "\"code\": 1000",
  "code 999",
  "code 1000"
].freeze
TRANSIENT_TRANSPORT_PATTERNS =

Last-resort transport-drop phrases for statusless errors that never surfaced as a typed transport class.

[
  "timeout", "timed out", "connection reset",
  "connection refused", "broken pipe", "end of file reached"
].freeze
LOCAL_PROGRAMMING_ERRORS =

Local Ruby PROGRAMMING errors — unambiguous bugs in our own code (or a caller’s), not provider/API blips. These must NEVER be retried: a retry storm would mask the bug behind backoff (the very thing that turned a mid-turn ‘NoMethodError` from the UI into three `llm.retry` warnings). They reach `classify` only because ModelCallRunner rescues StandardError broadly around the boundary call; the reference classify_api_error never sees them because it only ever runs at the API layer. So we short-circuit them to NON-retryable (reason stays :unknown) BEFORE the unknown→retryable fallback, surfacing the bug immediately. The set is curated by CLASS, not message: every entry is a clear local bug. RuntimeError is deliberately EXCLUDED — it is too generic (ruby_llm/providers raise it for transient conditions), so it stays on the message-based path and keeps its provider-blip retryability.

[
  NoMethodError, NameError, NoMatchingPatternError, NoMatchingPatternKeyError,
  ArgumentError, TypeError, NotImplementedError, FrozenError,
  LocalJumpError, ThreadError, FiberError
].freeze
MISSING_CREDENTIAL_PATTERNS =

A missing / unconfigured credential — raised BEFORE any HTTP call, so it carries no status and would otherwise fall through to the unknown→retryable default and trigger an ~80s retry storm that exits empty (#93). ruby_llm raises RubyLLM::ConfigurationError (“Missing configuration for OpenRouter: openrouter_api_key”) when a provider’s key is unset; our own adapter raises Rubino::Error (“Missing API key for provider …”). A missing key is a credential problem the user must fix — classify it as a NON-retryable AUTH error so the runner surfaces it immediately.

[
  "missing configuration for",
  "missing api key",
  "no api key",
  "api key is not set",
  "_api_key"
].freeze
INVALID_CREDENTIAL_PATTERNS =

A PRESENT but INVALID credential rejected by the provider via a statusless / untyped error body (MiniMax’s Anthropic-compatible endpoint says “login fail” with no 401), which used to fall through to the unknown→retryable default and burn ~60-90s of silent retries on a deterministic auth failure (#126). Same deal as a typed 401/403: NON-retryable AUTH, surfaced immediately. Patterns are the literal provider phrasings, kept narrow.

[
  "login fail",
  "invalid api key",
  "incorrect api key",
  "invalid x-api-key",
  "authentication_error",
  "authentication failed"
].freeze
INVALID_MEDIA_PATTERNS =

Provider media/image validation rejections — a PERMANENT 4xx-class complaint about the attachment itself, which some providers (MiniMax Anthropic-compat) surface statusless so it used to fall through to the unknown→retryable default and burn the whole retry budget (~80s) on a bad image (#98). The same attachment fails identically on every retry, so fail fast. Patterns are the literal provider phrasings, kept narrow.

[
  "media exceeds size limit",
  "invalid image content",
  "image: unknown format",
  "could not process image"
].freeze
CONTEXT_OVERFLOW_PATTERNS =
[
  "context length", "context window", "maximum context",
  "token limit", "too many tokens", "prompt is too long", "max_tokens"
].freeze
MODEL_NOT_FOUND_PATTERNS =
[
  "is not a valid model", "invalid model", "model not found",
  "model_not_found", "does not exist", "no such model", "unknown model"
].freeze

Class Method Summary collapse

Class Method Details

.classify(error) ⇒ Object

Classify an error into a ClassifiedError with reason + recovery hints. Priority mirrors the reference pipeline: typed/transport class → HTTP status →statusless provider-unknown / transport → unknown (retryable default).



124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# File 'lib/rubino/llm/error_classifier.rb', line 124

def classify(error)
  status = http_status(error)

  result = classify_missing_credential(error) ||
           classify_invalid_credential(error) ||
           classify_transport(error) ||
           classify_invalid_media(error) ||
           classify_typed(error) ||
           (status && classify_by_status(status, error)) ||
           classify_statusless(error)
  return result if result

  # A genuine local Ruby bug (NoMethodError, ArgumentError, …) is NOT a
  # retryable provider blip — propagate it immediately instead of letting
  # the unknown→retryable default mask it behind a backoff storm.
  return result_for(FailoverReason::UNKNOWN, status, error, retryable: false) if local_programming_error?(error)

  result_for(FailoverReason::UNKNOWN, status, error, retryable: true)
end

.classify_by_status(status, error) ⇒ Object

HTTP status classification with message-aware refinement, mirroring _classify_by_status (error_classifier.py:725) for the CORE reasons.



255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
# File 'lib/rubino/llm/error_classifier.rb', line 255

def classify_by_status(status, error)
  case status
  when 401, 403
    result_for(FailoverReason::AUTH, status, error,
               retryable: false, should_rotate_credential: true, should_fallback: true)
  when 402
    result_for(FailoverReason::BILLING, status, error,
               retryable: false, should_rotate_credential: true, should_fallback: true)
  when 404
    # Generic 404 with no "model not found" signal is treated as unknown
    # (retryable) per the reference: a misconfigured
    # endpoint or proxy glitch shouldn't masquerade as a missing model.
    if model_not_found?(error)
      result_for(FailoverReason::MODEL_NOT_FOUND, status, error,
                 retryable: false, should_fallback: true)
    else
      result_for(FailoverReason::UNKNOWN, status, error, retryable: true)
    end
  when 429
    result_for(FailoverReason::RATE_LIMIT, status, error,
               retryable: true, should_rotate_credential: true, should_fallback: true)
  when 503, 529
    result_for(FailoverReason::OVERLOADED, status, error, retryable: true)
  when 400
    if context_overflow?(error)
      result_for(FailoverReason::CONTEXT_OVERFLOW, status, error,
                 retryable: false, should_compress: true)
    elsif model_not_found?(error)
      result_for(FailoverReason::MODEL_NOT_FOUND, status, error,
                 retryable: false, should_fallback: true)
    else
      result_for(FailoverReason::FORMAT_ERROR, status, error,
                 retryable: false, should_fallback: true)
    end
  else
    if status >= 500
      result_for(FailoverReason::SERVER_ERROR, status, error, retryable: true)
    elsif status >= 400
      result_for(FailoverReason::FORMAT_ERROR, status, error,
                 retryable: false, should_fallback: true)
    end
  end
end

.classify_invalid_credential(error) ⇒ Object



193
194
195
196
197
198
199
# File 'lib/rubino/llm/error_classifier.rb', line 193

def classify_invalid_credential(error)
  msg = error.message.to_s.downcase
  return unless INVALID_CREDENTIAL_PATTERNS.any? { |p| msg.include?(p) }

  result_for(FailoverReason::AUTH, http_status(error), error,
             retryable: false, should_rotate_credential: true, should_fallback: true)
end

.classify_invalid_media(error) ⇒ Object



223
224
225
226
227
228
229
# File 'lib/rubino/llm/error_classifier.rb', line 223

def classify_invalid_media(error)
  msg = error.message.to_s.downcase
  return unless INVALID_MEDIA_PATTERNS.any? { |p| msg.include?(p) }

  result_for(FailoverReason::FORMAT_ERROR, http_status(error), error,
             retryable: false, should_fallback: true)
end

.classify_missing_credential(error) ⇒ Object



167
168
169
170
171
172
173
174
175
# File 'lib/rubino/llm/error_classifier.rb', line 167

def classify_missing_credential(error)
  is_config_error =
    defined?(RubyLLM::ConfigurationError) && error.is_a?(RubyLLM::ConfigurationError)
  msg = error.message.to_s.downcase
  return unless is_config_error || MISSING_CREDENTIAL_PATTERNS.any? { |p| msg.include?(p) }

  result_for(FailoverReason::AUTH, http_status(error), error,
             retryable: false, should_rotate_credential: true, should_fallback: true)
end

.classify_statusless(error) ⇒ Object

No decisive status: the MiniMax “unknown error” blip and bare transport drops. A permanent 4xx never reaches here (returned above), so the provider-unknown net stays narrow — mirrors the reference unknown→retryable.



302
303
304
305
306
307
308
309
310
311
312
# File 'lib/rubino/llm/error_classifier.rb', line 302

def classify_statusless(error)
  msg = error.message.to_s.downcase
  if UNKNOWN_PROVIDER_ERROR_PATTERNS.any? { |p| msg.include?(p) }
    return result_for(FailoverReason::UNKNOWN, nil, error, retryable: true)
  end
  if TRANSIENT_TRANSPORT_PATTERNS.any? { |p| msg.include?(p) }
    return result_for(FailoverReason::TIMEOUT, nil, error, retryable: true)
  end

  nil
end

.classify_transport(error) ⇒ Object

Transport drops (Faraday::ConnectionFailed for the MiniMax EOF, read/ connect timeouts, …) are retryable regardless of message — they never reach an HTTP status. STREAM_DROP_ERRORS lives on the adapter.



204
205
206
207
208
# File 'lib/rubino/llm/error_classifier.rb', line 204

def classify_transport(error)
  return unless STREAM_DROP_ERRORS.any? { |klass| error.is_a?(klass) }

  result_for(FailoverReason::TIMEOUT, nil, error, retryable: true)
end

.classify_typed(error) ⇒ Object

Typed ruby_llm errors we can name without a status lookup.



232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# File 'lib/rubino/llm/error_classifier.rb', line 232

def classify_typed(error)
  case error
  when RubyLLM::ContextLengthExceededError
    result_for(FailoverReason::CONTEXT_OVERFLOW, http_status(error), error,
               retryable: false, should_compress: true)
  when RubyLLM::UnauthorizedError, RubyLLM::ForbiddenError
    result_for(FailoverReason::AUTH, http_status(error), error,
               retryable: false, should_rotate_credential: true, should_fallback: true)
  when RubyLLM::PaymentRequiredError
    result_for(FailoverReason::BILLING, http_status(error), error,
               retryable: false, should_rotate_credential: true, should_fallback: true)
  when RubyLLM::RateLimitError
    result_for(FailoverReason::RATE_LIMIT, http_status(error) || 429, error,
               retryable: true, should_rotate_credential: true, should_fallback: true)
  when RubyLLM::OverloadedError, RubyLLM::ServiceUnavailableError
    result_for(FailoverReason::OVERLOADED, http_status(error), error, retryable: true)
  when RubyLLM::ServerError
    result_for(FailoverReason::SERVER_ERROR, http_status(error), error, retryable: true)
  end
end

.context_overflow?(error) ⇒ Boolean

Returns:

  • (Boolean)


347
348
349
350
351
352
# File 'lib/rubino/llm/error_classifier.rb', line 347

def context_overflow?(error)
  return true if error.is_a?(RubyLLM::ContextLengthExceededError)

  msg = error.message.to_s.downcase
  CONTEXT_OVERFLOW_PATTERNS.any? { |p| msg.include?(p) }
end

.http_status(error) ⇒ Object

HTTP status from a typed RubyLLM::Error’s wrapped Faraday response, or nil.



330
331
332
333
334
335
# File 'lib/rubino/llm/error_classifier.rb', line 330

def http_status(error)
  return unless error.respond_to?(:response) && error.response.respond_to?(:status)

  status = error.response.status
  status if status.is_a?(Integer)
end

.local_programming_error?(error) ⇒ Boolean

Returns:

  • (Boolean)


359
360
361
# File 'lib/rubino/llm/error_classifier.rb', line 359

def local_programming_error?(error)
  LOCAL_PROGRAMMING_ERRORS.any? { |klass| error.is_a?(klass) }
end

.model_not_found?(error) ⇒ Boolean

Returns:

  • (Boolean)


354
355
356
357
# File 'lib/rubino/llm/error_classifier.rb', line 354

def model_not_found?(error)
  msg = error.message.to_s.downcase
  MODEL_NOT_FOUND_PATTERNS.any? { |p| msg.include?(p) }
end

.result_for(reason, status, error, retryable:, should_compress: false, should_rotate_credential: false, should_fallback: false) ⇒ Object

── helpers ──────────────────────────────────────────────────────────



316
317
318
319
320
321
322
323
324
325
326
327
# File 'lib/rubino/llm/error_classifier.rb', line 316

def result_for(reason, status, error, retryable:, should_compress: false,
               should_rotate_credential: false, should_fallback: false)
  ClassifiedError.new(
    reason: reason,
    status_code: status,
    message: error.respond_to?(:message) ? error.message.to_s[0, 500] : error.to_s[0, 500],
    retryable: retryable,
    should_compress: should_compress,
    should_rotate_credential: should_rotate_credential,
    should_fallback: should_fallback
  )
end

.retryable?(error) ⇒ Boolean

Convenience: just the boolean the adapter’s retry loop needs.

Returns:

  • (Boolean)


145
146
147
# File 'lib/rubino/llm/error_classifier.rb', line 145

def retryable?(error)
  classify(error).retryable
end