Module: Legion::LLM::Inference::Executor::Escalation
- Included in:
- Legion::LLM::Inference::Executor
- Defined in:
- lib/legion/llm/inference/executor/escalation.rb
Overview
Escalation-area methods extracted from Executor verbatim (P4b §1.5, refactor-under-green). Owns the provider-call lifecycle (single + escalating, sync + stream + responses-API), error/retry classification, and the corresponding audit/metering emission.
Instance Method Summary collapse
-
#account_specific_error?(error) ⇒ Boolean
Errors scoped to a single account/instance (its credentials, billing, or quota) rather than to the model itself.
- #authentication_error?(err) ⇒ Boolean
-
#build_routing_payload_from_resolved ⇒ Object
Build a minimal routing_payload from the already-resolved routing state.
-
#classify_and_accumulate_exclusions(error:, lane:, payload:) ⇒ Object
Called after each failed dispatch in the while remaining.positive? loop.
-
#classify_error(error:) ⇒ Object
Classify error for the while remaining.positive? loop (G26 / D-C).
-
#client_stream_error?(err) ⇒ Boolean
Detect client-side stream errors (disconnects, broken pipes, socket timeouts) that originate from writing back to the HTTP client, not from the provider itself.
- #config_error?(err) ⇒ Boolean
- #context_overflow_error?(err) ⇒ Boolean
- #emit_error_audit(error, status:, provider: @resolved_provider, model: @resolved_model) ⇒ Object
- #emit_escalation_attempt_audit(provider:, model:, outcome:, duration_ms:, error: nil, attempt: 1) ⇒ Object
- #emit_escalation_attempt_metering(provider:, model:, duration_ms:, attempt: 1, status: 'success', error: nil, provider_submitted: true) ⇒ Object
- #error_metadata(err) ⇒ Object
- #escalation_attempt_hash(resolution, outcome:, failures:, duration_ms:) ⇒ Object
- #execute_provider_request ⇒ Object
- #execute_provider_request_native ⇒ Object
- #execute_provider_request_stream ⇒ Object
- #execute_provider_request_stream_native ⇒ Object
- #extract_retry_after(error) ⇒ Object
-
#internal_error?(err) ⇒ Boolean
G25 / B-H / PR #152 C5/C6: Internal errors (daemon NoMethodError/ArgumentError) come from shared daemon code — retrying on a different lane guarantees the same crash.
- #pipeline_escalation_enabled? ⇒ Boolean
- #pipeline_escalation_max_attempts ⇒ Object
- #record_escalation_failure(err, resolution, start_time, outcome:, operation:, handled: false) ⇒ Object
- #record_provider_response ⇒ Object
- #report_provider_failure(error, provider: @resolved_provider, instance: @resolved_instance, model: @resolved_model, offering_id: @resolved_offering_id, status: 'provider_error') ⇒ Object
- #report_provider_health(signal, duration_ms, metadata: {}) ⇒ Object
- #request_payload_error?(err) ⇒ Boolean
- #run_provider_call_single ⇒ Object
-
#run_provider_call_with_attempts(routing_payload:, assembler: nil, stream_block: nil) ⇒ Object
G26 / D-C: The bounded while remaining.positive? loop.
-
#select_next_lane(routing_payload:, attempt_idx:) ⇒ Object
Select the next lane to dispatch to.
- #step_provider_call ⇒ Object
- #step_provider_call_stream(&block) ⇒ Object
Instance Method Details
#account_specific_error?(error) ⇒ Boolean
Errors scoped to a single account/instance (its credentials, billing, or quota) rather than to the model itself. A sibling instance of the same provider+model can still succeed, so these must not skip the whole model.
87 88 89 90 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 87 def account_specific_error?(error) = error.respond_to?(:message) ? error..to_s : error.to_s .match?(/credit balance|insufficient (?:credit|funds|quota|balance)|payment required|billing|quota (?:exceeded|exhausted)|over quota/i) end |
#authentication_error?(err) ⇒ Boolean
383 384 385 386 387 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 383 def authentication_error?(err) err.is_a?(Legion::LLM::AuthError) || err.is_a?(Faraday::UnauthorizedError) || err.is_a?(Faraday::ForbiddenError) end |
#build_routing_payload_from_resolved ⇒ Object
Build a minimal routing_payload from the already-resolved routing state. C2 (PayloadBuilder) will replace this with a proper single-ingress build from headers+body. This bridge keeps the loop working in C1 while the old chain machinery still exists.
IMPORTANT: providers/instances/tiers are intentionally NOT pinned here. Pinning providers: [:bedrock] would prevent cross-provider failover — the router returns nil after bedrock lanes are exhausted, raising EscalationExhausted prematurely. The preferred provider is expressed via the model filter (models: [model]) and by the routing payload built in C2 from explicit x-legion-* headers (hard filters) vs body hints. For the while remaining.positive? loop, tried_lanes exclusion handles exhaustion naturally: after all bedrock lanes are in tried_lanes, the router selects the next-best provider.
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 34 def build_routing_payload_from_resolved estimated = estimate_request_tokens # Include the resolved model as a soft filter so request_lane picks the requested model # from whichever provider has it. No provider/tier/instance pinning — that would prevent # cross-provider failover and break the while remaining.positive? escalation loop. models = @resolved_model ? [@resolved_model.to_s] : [] { type: :inference, tiers: [], providers: [], instances: [], models: models, capabilities: chain_required_capabilities, privacy: (@proactive_tier_assignment&.dig(:privacy) || :normal).to_sym, estimated_context: estimated.positive? ? estimated : nil, tried_lanes: [], max_attempts: pipeline_escalation_max_attempts } end |
#classify_and_accumulate_exclusions(error:, lane:, payload:) ⇒ Object
Called after each failed dispatch in the while remaining.positive? loop. Updates the routing payload’s tried_lanes and/or trips the health circuit. Terminal errors re-raise immediately so the caller’s while loop stops.
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 440 def classify_and_accumulate_exclusions(error:, lane:, payload:, **) case classify_error(error: error) when :internal_error, :context_overflow, :payload_error, :policy_denied raise error when :account_specific # Account/instance-scoped failure: trip the per-instance circuit. # All lanes for this instance go lane_weight ≤ 0; sibling instances stay eligible. log.warn("[llm][escalation] action=account_specific_error lane=#{lane[:id]} " \ "provider=#{lane[:provider_family]} instance=#{lane[:instance_id]} " \ "error=#{error.class}: #{error..to_s[0, 200]}") Legion::LLM::Router.health_tracker.trip_circuit( # allowlist:write-side provider: lane[:provider_family], instance: lane[:instance_id], reason: error. ) else payload[:tried_lanes] << lane[:id] end rescue Legion::LLM::ModelNotAllowed, ::NoMethodError, ::ArgumentError raise rescue StandardError => e raise if request_payload_error?(e) raise if context_overflow_error?(e) handle_exception(e, level: :warn, operation: 'llm.pipeline.classify_and_accumulate_exclusions', lane: lane[:id]) payload[:tried_lanes] << lane[:id] end |
#classify_error(error:) ⇒ Object
Classify error for the while remaining.positive? loop (G26 / D-C). internal_error MUST be checked BEFORE account_specific (G25 / B-H): a daemon NoMethodError must never be treated as a retriable account-scoped failure.
425 426 427 428 429 430 431 432 433 434 435 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 425 def classify_error(error:, **) return :context_overflow if context_overflow_error?(error) return :payload_error if request_payload_error?(error) return :policy_denied if error.is_a?(Legion::LLM::ModelNotAllowed) return :internal_error if internal_error?(error) # terminal before account_specific return :account_specific if authentication_error?(error) || config_error?(error) || account_specific_error?(error) :transient end |
#client_stream_error?(err) ⇒ Boolean
Detect client-side stream errors (disconnects, broken pipes, socket timeouts) that originate from writing back to the HTTP client, not from the provider itself.
403 404 405 406 407 408 409 410 411 412 413 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 403 def client_stream_error?(err) name = err.class.name.to_s msg = err..to_s name.include?('Puma::ConnectionError') || name.include?('Errno::EPIPE') || (name.include?('IOError') && msg.include?('closed')) || (name.include?('IOError') && msg.include?('already closed')) || name.include?('EOFError') || name.include?('Errno::ECONNRESET') || name.include?('Errno::ECONNABORTED') end |
#config_error?(err) ⇒ Boolean
377 378 379 380 381 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 377 def config_error?(err) name = err.class.name.to_s msg = err..to_s CONFIG_ERROR_PATTERNS.any? { |pat| pat.match?(name) || pat.match?(msg) } end |
#context_overflow_error?(err) ⇒ Boolean
395 396 397 398 399 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 395 def context_overflow_error?(err) err.is_a?(Legion::LLM::ContextOverflow) || err.class.name.to_s.include?('ContextLength') || CONTEXT_OVERFLOW_ERROR_PATTERNS.any? { |pat| pat.match?(err..to_s) } end |
#emit_error_audit(error, status:, provider: @resolved_provider, model: @resolved_model) ⇒ Object
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 270 def emit_error_audit(error, status:, provider: @resolved_provider, model: @resolved_model) routing = { provider: provider, model: model } routing[:offering_id] = @resolved_offering_id if @resolved_offering_id routing[:offering_metadata] = @resolved_offering_metadata if @resolved_offering_metadata&.any? Legion::LLM::Audit.emit_prompt( request_id: @request.id, conversation_id: @request.conversation_id, caller: @request.caller, routing: routing, tokens: {}, status: status, error: { class: error.class.name, message: error. }, tracing: @tracing, timestamp: Time.now, request_type: 'chat', messages: @request. ) rescue StandardError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.emit_error_audit') end |
#emit_escalation_attempt_audit(provider:, model:, outcome:, duration_ms:, error: nil, attempt: 1) ⇒ Object
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 292 def emit_escalation_attempt_audit(provider:, model:, outcome:, duration_ms:, error: nil, attempt: 1) routing = { provider: provider, model: model } routing[:offering_id] = @resolved_offering_id if @resolved_offering_id routing[:offering_metadata] = @resolved_offering_metadata if @resolved_offering_metadata&.any? tokens = {} if @extracted_tokens input_tokens = @extracted_tokens.respond_to?(:input_tokens) ? @extracted_tokens.input_tokens.to_i : 0 output_tokens = @extracted_tokens.respond_to?(:output_tokens) ? @extracted_tokens.output_tokens.to_i : 0 thinking = @extracted_tokens.respond_to?(:thinking_tokens) ? @extracted_tokens.thinking_tokens.to_i : 0 tokens = { input_tokens: input_tokens, output_tokens: output_tokens, thinking_tokens: thinking }.compact end content = extract_response_content thinking_response = extract_thinking Legion::LLM::Audit.emit_prompt( request_id: @request.id, conversation_id: @request.conversation_id, caller: @request.caller, routing: routing, tokens: tokens, status: outcome == :success ? 'success' : 'error', provider_response_ref: "#{@request.id}:attempt:#{attempt}", latency_ms: duration_ms, response_content: content, response_thinking: thinking_response, error: error ? { class: error.class.name, message: error. } : nil, tracing: @tracing, timestamp: Time.now, request_type: 'chat', messages: @request. ) rescue StandardError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.emit_escalation_attempt_audit') end |
#emit_escalation_attempt_metering(provider:, model:, duration_ms:, attempt: 1, status: 'success', error: nil, provider_submitted: true) ⇒ Object
329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 329 def emit_escalation_attempt_metering(provider:, model:, duration_ms:, attempt: 1, status: 'success', error: nil, provider_submitted: true) @extracted_tokens ||= extract_tokens input_tokens = @extracted_tokens.respond_to?(:input_tokens) ? @extracted_tokens.input_tokens.to_i : 0 output_tokens = @extracted_tokens.respond_to?(:output_tokens) ? @extracted_tokens.output_tokens.to_i : 0 cost_usd = estimate_cost(input_tokens, output_tokens) event = Legion::LLM::Inference::Steps::Metering.build_event( provider: provider, model_id: model, offering_id: @resolved_offering_id, offering_metadata: @resolved_offering_metadata, tier: @resolved_tier, request_type: if @request.respond_to?(:request_type) @request.request_type else 'chat' end, input_tokens: input_tokens, output_tokens: output_tokens, latency_ms: duration_ms, wall_clock_ms: duration_ms, cost_usd: cost_usd, request_id: @request.id, conversation_id: @request.conversation_id, correlation_id: @tracing&.dig(:correlation_id), caller: @request.caller, identity: metering_identity, billing: @request.billing, routing_reason: "escalation_attempt:#{attempt}", messages: @request., response_content: extract_response_content, response_thinking: extract_thinking, status: status, error: (error), provider_submitted: provider_submitted ) Legion::LLM::Inference::Steps::Metering.publish_or_spool(event) rescue StandardError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.emit_escalation_attempt_metering') end |
#error_metadata(err) ⇒ Object
389 390 391 392 393 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 389 def (err) return nil unless err { class: err.class.name, message: err..to_s } end |
#escalation_attempt_hash(resolution, outcome:, failures:, duration_ms:) ⇒ Object
164 165 166 167 168 169 170 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 164 def escalation_attempt_hash(resolution, outcome:, failures:, duration_ms:) attempt = { model: resolution.model, provider: resolution.provider, tier: resolution.tier, outcome: outcome, failures: failures, duration_ms: duration_ms } attempt[:offering_id] = resolution.offering_id if resolution.offering_id attempt[:offering_metadata] = resolution. unless resolution..empty? attempt end |
#execute_provider_request ⇒ Object
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 182 def execute_provider_request @timestamps[:provider_start] = Time.now @timeline.record( category: :provider, key: 'provider:request_sent', exchange_id: @exchange_id, direction: :outbound, detail: "calling #{@resolved_provider}", from: 'pipeline', to: "provider:#{@resolved_provider}" ) raise Legion::LLM::ProviderError, "Native provider not registered: #{@resolved_provider}" unless fleet_dispatch? || use_native_dispatch?(@resolved_provider) execute_provider_request_native @timestamps[:provider_end] = Time.now record_provider_response end |
#execute_provider_request_native ⇒ Object
199 200 201 202 203 204 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 199 def execute_provider_request_native result = execute_native_tool_loop (result.) if result.respond_to?(:metadata) @raw_response = result @tool_loop_messages = @last_tool_loop_messages if @last_tool_loop_messages end |
#execute_provider_request_stream ⇒ Object
650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 650 def execute_provider_request_stream(&) @timestamps[:provider_start] = Time.now @timeline.record( category: :provider, key: 'provider:request_sent', exchange_id: @exchange_id, direction: :outbound, detail: "streaming from #{@resolved_provider}", from: 'pipeline', to: "provider:#{@resolved_provider}" ) raise Legion::LLM::ProviderError, "Native provider not registered: #{@resolved_provider}" unless fleet_dispatch? || use_native_dispatch?(@resolved_provider) execute_provider_request_stream_native(&) @timestamps[:provider_end] = Time.now record_provider_response end |
#execute_provider_request_stream_native ⇒ Object
667 668 669 670 671 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 667 def execute_provider_request_stream_native(&) result = execute_native_streaming_tool_loop(&) (result.) if result.respond_to?(:metadata) @raw_response = result end |
#extract_retry_after(error) ⇒ Object
264 265 266 267 268 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 264 def extract_retry_after(error) return nil unless error.respond_to?(:response) && error.response.is_a?(Hash) error.response[:headers]&.fetch('retry-after', nil)&.to_i end |
#internal_error?(err) ⇒ Boolean
G25 / B-H / PR #152 C5/C6: Internal errors (daemon NoMethodError/ArgumentError) come from shared daemon code — retrying on a different lane guarantees the same crash. Classified as terminal: raise immediately, never retry, never trip circuits, never push to tried_lanes.
418 419 420 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 418 def internal_error?(err) err.is_a?(::NoMethodError) || err.is_a?(::ArgumentError) end |
#pipeline_escalation_enabled? ⇒ Boolean
172 173 174 175 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 172 def pipeline_escalation_enabled? esc = Legion::Settings[:llm].dig(:routing, :escalation) || {} esc[:enabled] == true && esc[:pipeline_enabled] == true end |
#pipeline_escalation_max_attempts ⇒ Object
177 178 179 180 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 177 def pipeline_escalation_max_attempts esc = Legion::Settings[:llm].dig(:routing, :escalation) || {} esc[:max_attempts] || 3 end |
#record_escalation_failure(err, resolution, start_time, outcome:, operation:, handled: false) ⇒ Object
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 92 def record_escalation_failure(err, resolution, start_time, outcome:, operation:, handled: false) @last_escalation_error = err duration_ms = ((Time.now - start_time) * 1000).round handle_exception(err, level: :warn, handled: handled, operation: operation, provider: resolution.provider, model: resolution.model, duration_ms: duration_ms) if request_payload_error?(err) log.error "[llm][escalation] action=request_payload_error provider=#{resolution.provider} " \ "instance=#{resolution.instance || 'default'} model=#{resolution.model} " \ "error=#{err..to_s[0, 500]} daemon_side_payload_bug=true provider_health=false" elsif account_specific_error?(err) # Account-scoped failure (credit balance, payment, quota). It is # deterministic — it will fail every call until the operator tops up — # so DEPRIORITIZE this instance immediately by tripping its per-instance # circuit, to stop wasting calls on an account that cannot work. The # circuit is per (provider, instance), so healthy sibling instances and # the provider globally are unaffected, and its cooldown -> half_open # re-probe auto-recovers the instance once credits return. Do NOT # deny_model — the model is fine on other instances/accounts. log.warn "[llm][escalation] action=account_scoped_error provider=#{resolution.provider} " \ "instance=#{resolution.instance || 'default'} model=#{resolution.model} " \ "error=#{err..to_s[0, 300]} deprioritized=true model_denied=false" Legion::LLM::Router.health_tracker.trip_circuit( # allowlist:write-side provider: resolution.provider, instance: resolution.instance, reason: err. ) elsif authentication_error?(err) || config_error?(err) Legion::LLM::Router.health_tracker.deny_model( # allowlist:write-side provider: resolution.provider, model: resolution.model, instance: resolution.instance, reason: err. ) Legion::LLM::Router.health_tracker.trip_circuit( # allowlist:write-side provider: resolution.provider, instance: resolution.instance, reason: err. ) elsif !context_overflow_error?(err) Legion::LLM::Router.health_tracker.report(provider: resolution.provider, instance: resolution.instance, # allowlist:write-side offering_id: resolution.offering_id, signal: :error, value: 1, metadata: { reason: err.class.name, message: err..to_s[0, 500], model: resolution.model }) end @escalation_history << escalation_attempt_hash( resolution, outcome: outcome, failures: [err.class.name], duration_ms: duration_ms ) @timeline.record( category: :provider, key: 'escalation:attempt', direction: :internal, detail: "attempt #{@escalation_history.size}: #{resolution.provider}:#{resolution.model} => #{outcome}", from: 'pipeline', to: "provider:#{resolution.provider}" ) emit_escalation_attempt_metering( provider: resolution.provider, model: resolution.model, duration_ms: duration_ms, attempt: @escalation_history.size, status: 'error', error: err ) emit_escalation_attempt_audit( provider: resolution.provider, model: resolution.model, outcome: outcome, duration_ms: duration_ms, error: err, attempt: @escalation_history.size ) end |
#record_provider_response ⇒ Object
206 207 208 209 210 211 212 213 214 215 216 217 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 206 def record_provider_response duration_ms = ((@timestamps[:provider_end] - @timestamps[:provider_start]) * 1000).to_i report_provider_health(:success, duration_ms) if @resolved_offering_id log.debug("[pipeline][provider] action=response_received provider=#{@resolved_provider} model=#{@resolved_model} duration_ms=#{duration_ms}") @timeline.record( category: :provider, key: 'provider:response_received', exchange_id: @exchange_id, direction: :inbound, detail: 'response received', from: "provider:#{@resolved_provider}", to: 'pipeline', duration_ms: duration_ms ) end |
#report_provider_failure(error, provider: @resolved_provider, instance: @resolved_instance, model: @resolved_model, offering_id: @resolved_offering_id, status: 'provider_error') ⇒ Object
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 219 def report_provider_failure(error, provider: @resolved_provider, instance: @resolved_instance, model: @resolved_model, offering_id: @resolved_offering_id, status: 'provider_error') emit_error_audit(error, status: status, provider: provider, model: model) return if request_payload_error?(error) return if context_overflow_error?(error) return if client_stream_error?(error) if authentication_error?(error) || config_error?(error) Legion::LLM::Router.health_tracker.deny_model( # allowlist:write-side provider: provider, model: model, instance: instance, reason: error. ) Legion::LLM::Router.health_tracker.trip_circuit( # allowlist:write-side provider: provider, instance: instance, reason: error. ) return end Legion::LLM::Router.health_tracker.report( # allowlist:write-side provider: provider, instance: instance, offering_id: offering_id, signal: :error, value: 1, metadata: { reason: error.class.name, message: error..to_s[0, 500], model: model } ) rescue StandardError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.report_provider_failure') end |
#report_provider_health(signal, duration_ms, metadata: {}) ⇒ Object
251 252 253 254 255 256 257 258 259 260 261 262 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 251 def report_provider_health(signal, duration_ms, metadata: {}) return unless defined?(Legion::LLM::Router) && Legion::LLM::Router.routing_enabled? Legion::LLM::Router.health_tracker.report(provider: @resolved_provider, instance: @resolved_instance, # allowlist:write-side offering_id: @resolved_offering_id, signal: signal, value: 1, metadata: .merge(duration_ms: duration_ms)) Legion::LLM::Router.health_tracker.report(provider: @resolved_provider, instance: @resolved_instance, # allowlist:write-side offering_id: @resolved_offering_id, signal: :latency, value: duration_ms, metadata: {}) rescue StandardError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.report_provider_health') end |
#request_payload_error?(err) ⇒ Boolean
371 372 373 374 375 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 371 def request_payload_error?(err) name = err.class.name.to_s msg = err..to_s REQUEST_PAYLOAD_ERROR_PATTERNS.any? { |pat| pat.match?(name) || pat.match?(msg) } end |
#run_provider_call_single ⇒ Object
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 55 def run_provider_call_single execute_provider_request rescue Legion::LLM::AuthError, Faraday::UnauthorizedError, Faraday::ForbiddenError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.auth', provider: @resolved_provider, model: @resolved_model) report_provider_failure(e, status: 'auth_failed') raise e.is_a?(Legion::LLM::AuthError) ? e : Legion::LLM::AuthError.new(e.) rescue Legion::LLM::ContextOverflow => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.context_overflow', provider: @resolved_provider, model: @resolved_model) emit_error_audit(e, status: 'context_overflow') raise rescue Legion::LLM::RateLimitError, Faraday::TooManyRequestsError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.rate_limit', provider: @resolved_provider, model: @resolved_model) report_provider_failure(e, status: 'rate_limited') raise e.is_a?(Legion::LLM::RateLimitError) ? e : Legion::LLM::RateLimitError.new(e., retry_after: extract_retry_after(e)) rescue Legion::LLM::ProviderDown, Faraday::ConnectionFailed, Faraday::TimeoutError, Faraday::SSLError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.provider_down', provider: @resolved_provider, model: @resolved_model) report_provider_failure(e, status: 'provider_down') raise e.is_a?(Legion::LLM::ProviderDown) ? e : Legion::LLM::ProviderDown.new(e.) rescue Legion::LLM::ProviderError, Faraday::ServerError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.provider_error', provider: @resolved_provider, model: @resolved_model) report_provider_failure(e, status: 'provider_error') raise e.is_a?(Legion::LLM::ProviderError) ? e : Legion::LLM::ProviderError.new(e.) end |
#run_provider_call_with_attempts(routing_payload:, assembler: nil, stream_block: nil) ⇒ Object
G26 / D-C: The bounded while remaining.positive? loop. Replaces the @escalation_chain iteration. Bounded by construction — remaining only decreases. No loop do, no retry, no redo inside this block.
G27 / D-G: Dual-error semantics tracked by attempt_idx == 0:
NoLaneAvailable — request_lane returned nil on first try (attempt_idx == 0)
EscalationExhausted — tried at least one lane (attempt_idx >= 1) then ran out
First-attempt seeding: if step_routing already resolved a provider/model, we look up the corresponding Inventory lane for that (provider, instance, model) to use as the first lane. This preserves step_routing’s resolution while keeping the loop-based retry mechanism. N×N: single canonical escalation loop — no provider-specific branches. The API namespace translator converts all client formats to canonical form; this loop dispatches only through execute_provider_request / execute_provider_request_stream.
481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 481 def run_provider_call_with_attempts(routing_payload:, assembler: nil, stream_block: nil) remaining = routing_payload[:max_attempts] || Legion::Settings[:llm][:routing][:max_attempts] total_attempts = remaining attempt_idx = 0 last_error = nil log.debug('[llm][executor] action=run_provider_call_with_attempts ' \ "max_attempts=#{remaining} provider=#{@resolved_provider} model=#{@resolved_model}") while remaining.positive? lane = select_next_lane(routing_payload: routing_payload, attempt_idx: attempt_idx) if lane.nil? if attempt_idx.zero? err = Legion::LLM::Errors::NoLaneAvailable.new( filters: routing_payload.slice(:tiers, :providers, :instances, :models, :capabilities, :thinking, :privacy, :estimated_context) ) log.warn("[llm][executor] action=no_lane_available filters=#{err.filters.inspect}") emit_error_audit(err, status: 'no_lane_available') else err = Legion::LLM::Errors::EscalationExhausted.new( attempts: attempt_idx, tried_lanes: routing_payload[:tried_lanes], last_error: last_error ) log.warn("[llm][executor] action=escalation_exhausted_mid_loop attempts=#{attempt_idx} " \ "tried=#{routing_payload[:tried_lanes].size}") emit_error_audit(err, status: 'escalation_exhausted') end raise err end remaining -= 1 attempt_idx += 1 @resolved_provider = lane[:provider_family] @resolved_instance = lane[:instance_id] @resolved_model = lane[:model] @resolved_tier = lane[:tier] @resolved_offering_id = lane[:id] @resolved_offering_metadata = ( lane.slice(:canonical_model_alias, :limits, :cost, :capabilities).compact ) log.info("[llm][executor] action=dispatch_attempt attempt=#{attempt_idx}/#{total_attempts} " \ "lane=#{lane[:id]} provider=#{@resolved_provider} model=#{@resolved_model}") assembler&.begin_dispatch_on(lane: lane) if assembler.respond_to?(:begin_dispatch_on) start_time = Time.now begin # N×N: single canonical dispatch path — no provider-specific branches. # The API namespace translator converts all client formats (OpenAI Chat, # OpenAI Responses, Anthropic Messages) to canonical before the executor # receives the request. The lex-llm-* provider adapter handles wire format. if stream_block execute_provider_request_stream(&stream_block) else execute_provider_request end duration_ms = ((Time.now - start_time) * 1000).round report_provider_health(:success, duration_ms) if @resolved_offering_id @timeline.record( category: :provider, key: 'escalation:attempt', direction: :internal, detail: "attempt #{attempt_idx}: #{@resolved_provider}:#{@resolved_model} => success", from: 'pipeline', to: "provider:#{@resolved_provider}" ) emit_escalation_attempt_metering(provider: @resolved_provider, model: @resolved_model, duration_ms: duration_ms, attempt: attempt_idx) emit_escalation_attempt_audit(provider: @resolved_provider, model: @resolved_model, outcome: :success, duration_ms: duration_ms, attempt: attempt_idx) return rescue StandardError => e last_error = e duration_ms = ((Time.now - start_time) * 1000).round record_escalation_failure(e, Legion::LLM::Router::Resolution.new( tier: @resolved_tier, provider: @resolved_provider, instance: @resolved_instance, model: @resolved_model, offering_id: @resolved_offering_id ), start_time, outcome: :error, operation: 'llm.pipeline.attempts_loop', handled: true) assembler&.provider_failover_pending!(from: lane) if assembler.respond_to?(:provider_failover_pending!) classify_and_accumulate_exclusions(error: e, lane: lane, payload: routing_payload) end end err = Legion::LLM::Errors::EscalationExhausted.new( attempts: total_attempts, tried_lanes: routing_payload[:tried_lanes], last_error: last_error ) log.warn("[llm][executor] action=escalation_exhausted_loop_complete attempts=#{total_attempts} " \ "tried=#{routing_payload[:tried_lanes].size}") emit_error_audit(err, status: 'escalation_exhausted') raise err end |
#select_next_lane(routing_payload:, attempt_idx:) ⇒ Object
Select the next lane to dispatch to. On the first attempt (attempt_idx == 0): prefer the Inventory lane matching the provider/model already resolved by step_routing (preserves routing phase resolution). Falls through to request_lane if no specific lane exists. On subsequent attempts: delegate entirely to request_lane (the new SSOT, per G26/SSOT design).
590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 590 def select_next_lane(routing_payload:, attempt_idx:, **) if attempt_idx.zero? && @resolved_provider && @resolved_model # First attempt: look up the Inventory lane for the routing-resolved (provider, instance, model). # If found and not already tried/blocked, use it. Otherwise fall through to request_lane. provider_lanes = Legion::LLM::Inventory.lanes_for( provider: @resolved_provider, instance: @resolved_instance ) preferred = provider_lanes.find do |l| l[:model] == @resolved_model && !routing_payload[:tried_lanes].include?(l[:id]) && l[:lane_weight].to_i.positive? end return preferred if preferred end Legion::LLM::Router.request_lane(**routing_payload) rescue StandardError => e handle_exception(e, level: :warn, handled: true, operation: 'llm.pipeline.select_next_lane') Legion::LLM::Router.request_lane(**routing_payload) end |
#step_provider_call ⇒ Object
13 14 15 16 17 18 19 20 21 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 13 def step_provider_call escalation = pipeline_escalation_enabled? log.debug "[llm][executor] action=step_provider_call provider=#{@resolved_provider} model=#{@resolved_model} escalation=#{escalation}" if escalation run_provider_call_with_attempts(routing_payload: build_routing_payload_from_resolved) else run_provider_call_single end end |
#step_provider_call_stream(&block) ⇒ Object
610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 |
# File 'lib/legion/llm/inference/executor/escalation.rb', line 610 def step_provider_call_stream(&block) if pipeline_escalation_enabled? run_provider_call_with_attempts(routing_payload: build_routing_payload_from_resolved, stream_block: block) return end execute_provider_request_stream(&block) rescue Legion::LLM::AuthError, Faraday::UnauthorizedError, Faraday::ForbiddenError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.auth', provider: @resolved_provider, model: @resolved_model) report_provider_failure(e, status: 'auth_failed') raise e.is_a?(Legion::LLM::AuthError) ? e : Legion::LLM::AuthError.new(e.) rescue Legion::LLM::ContextOverflow => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.context_overflow', provider: @resolved_provider, model: @resolved_model) emit_error_audit(e, status: 'context_overflow') raise rescue Legion::LLM::RateLimitError, Faraday::TooManyRequestsError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.rate_limit', provider: @resolved_provider, model: @resolved_model) report_provider_failure(e, status: 'rate_limited') raise e.is_a?(Legion::LLM::RateLimitError) ? e : Legion::LLM::RateLimitError.new(e., retry_after: extract_retry_after(e)) rescue Legion::LLM::ProviderDown, Faraday::ConnectionFailed, Faraday::TimeoutError, Faraday::SSLError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.provider_down', provider: @resolved_provider, model: @resolved_model) report_provider_failure(e, status: 'provider_down') raise e.is_a?(Legion::LLM::ProviderDown) ? e : Legion::LLM::ProviderDown.new(e.) rescue Legion::LLM::ProviderError, Faraday::ServerError => e handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.provider_error', provider: @resolved_provider, model: @resolved_model) report_provider_failure(e, status: 'provider_error') raise e.is_a?(Legion::LLM::ProviderError) ? e : Legion::LLM::ProviderError.new(e.) rescue StandardError => e raise if client_stream_error?(e) report_provider_failure(e, status: 'provider_error') raise end |