Module: Legion::LLM::Inference::Executor::Escalation

Included in:
Legion::LLM::Inference::Executor
Defined in:
lib/legion/llm/inference/executor/escalation.rb

Overview

Escalation-area methods extracted from Executor verbatim (P4b §1.5, refactor-under-green). Owns the provider-call lifecycle (single + escalating, sync + stream + responses-API), error/retry classification, and the corresponding audit/metering emission.

Instance Method Summary collapse

Instance Method Details

#account_specific_error?(error) ⇒ Boolean

Errors scoped to a single account/instance (its credentials, billing, or quota) rather than to the model itself. A sibling instance of the same provider+model can still succeed, so these must not skip the whole model.

Returns:

  • (Boolean)


87
88
89
90
# File 'lib/legion/llm/inference/executor/escalation.rb', line 87

def (error)
  message = error.respond_to?(:message) ? error.message.to_s : error.to_s
  message.match?(/credit balance|insufficient (?:credit|funds|quota|balance)|payment required|billing|quota (?:exceeded|exhausted)|over quota/i)
end

#authentication_error?(err) ⇒ Boolean

Returns:

  • (Boolean)


383
384
385
386
387
# File 'lib/legion/llm/inference/executor/escalation.rb', line 383

def authentication_error?(err)
  err.is_a?(Legion::LLM::AuthError) ||
    err.is_a?(Faraday::UnauthorizedError) ||
    err.is_a?(Faraday::ForbiddenError)
end

#build_routing_payload_from_resolvedObject

Build a minimal routing_payload from the already-resolved routing state. C2 (PayloadBuilder) will replace this with a proper single-ingress build from headers+body. This bridge keeps the loop working in C1 while the old chain machinery still exists.

IMPORTANT: providers/instances/tiers are intentionally NOT pinned here. Pinning providers: [:bedrock] would prevent cross-provider failover — the router returns nil after bedrock lanes are exhausted, raising EscalationExhausted prematurely. The preferred provider is expressed via the model filter (models: [model]) and by the routing payload built in C2 from explicit x-legion-* headers (hard filters) vs body hints. For the while remaining.positive? loop, tried_lanes exclusion handles exhaustion naturally: after all bedrock lanes are in tried_lanes, the router selects the next-best provider.



34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/legion/llm/inference/executor/escalation.rb', line 34

def build_routing_payload_from_resolved
  estimated = estimate_request_tokens
  # Include the resolved model as a soft filter so request_lane picks the requested model
  # from whichever provider has it. No provider/tier/instance pinning — that would prevent
  # cross-provider failover and break the while remaining.positive? escalation loop.
  models = @resolved_model ? [@resolved_model.to_s] : []

  {
    type:              :inference,
    tiers:             [],
    providers:         [],
    instances:         [],
    models:            models,
    capabilities:      chain_required_capabilities,
    privacy:           (@proactive_tier_assignment&.dig(:privacy) || :normal).to_sym,
    estimated_context: estimated.positive? ? estimated : nil,
    tried_lanes:       [],
    max_attempts:      pipeline_escalation_max_attempts
  }
end

#classify_and_accumulate_exclusions(error:, lane:, payload:) ⇒ Object

Called after each failed dispatch in the while remaining.positive? loop. Updates the routing payload’s tried_lanes and/or trips the health circuit. Terminal errors re-raise immediately so the caller’s while loop stops.



440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
# File 'lib/legion/llm/inference/executor/escalation.rb', line 440

def classify_and_accumulate_exclusions(error:, lane:, payload:, **)
  case classify_error(error: error)
  when :internal_error, :context_overflow, :payload_error, :policy_denied
    raise error
  when :account_specific
    # Account/instance-scoped failure: trip the per-instance circuit.
    # All lanes for this instance go lane_weight ≤ 0; sibling instances stay eligible.
    log.warn("[llm][escalation] action=account_specific_error lane=#{lane[:id]} " \
             "provider=#{lane[:provider_family]} instance=#{lane[:instance_id]} " \
             "error=#{error.class}: #{error.message.to_s[0, 200]}")
    Legion::LLM::Router.health_tracker.trip_circuit( # allowlist:write-side
      provider: lane[:provider_family], instance: lane[:instance_id], reason: error.message
    )
  else
    payload[:tried_lanes] << lane[:id]
  end
rescue Legion::LLM::ModelNotAllowed, ::NoMethodError, ::ArgumentError
  raise
rescue StandardError => e
  raise if request_payload_error?(e)
  raise if context_overflow_error?(e)

  handle_exception(e, level: :warn, operation: 'llm.pipeline.classify_and_accumulate_exclusions',
                      lane: lane[:id])
  payload[:tried_lanes] << lane[:id]
end

#classify_error(error:) ⇒ Object

Classify error for the while remaining.positive? loop (G26 / D-C). internal_error MUST be checked BEFORE account_specific (G25 / B-H): a daemon NoMethodError must never be treated as a retriable account-scoped failure.



425
426
427
428
429
430
431
432
433
434
435
# File 'lib/legion/llm/inference/executor/escalation.rb', line 425

def classify_error(error:, **)
  return :context_overflow  if context_overflow_error?(error)
  return :payload_error     if request_payload_error?(error)
  return :policy_denied     if error.is_a?(Legion::LLM::ModelNotAllowed)
  return :internal_error    if internal_error?(error) # terminal before account_specific
  return :account_specific  if authentication_error?(error) ||
                               config_error?(error) ||
                               (error)

  :transient
end

#client_stream_error?(err) ⇒ Boolean

Detect client-side stream errors (disconnects, broken pipes, socket timeouts) that originate from writing back to the HTTP client, not from the provider itself.

Returns:

  • (Boolean)


403
404
405
406
407
408
409
410
411
412
413
# File 'lib/legion/llm/inference/executor/escalation.rb', line 403

def client_stream_error?(err)
  name = err.class.name.to_s
  msg  = err.message.to_s
  name.include?('Puma::ConnectionError') ||
    name.include?('Errno::EPIPE') ||
    (name.include?('IOError') && msg.include?('closed')) ||
    (name.include?('IOError') && msg.include?('already closed')) ||
    name.include?('EOFError') ||
    name.include?('Errno::ECONNRESET') ||
    name.include?('Errno::ECONNABORTED')
end

#config_error?(err) ⇒ Boolean

Returns:

  • (Boolean)


377
378
379
380
381
# File 'lib/legion/llm/inference/executor/escalation.rb', line 377

def config_error?(err)
  name = err.class.name.to_s
  msg = err.message.to_s
  CONFIG_ERROR_PATTERNS.any? { |pat| pat.match?(name) || pat.match?(msg) }
end

#context_overflow_error?(err) ⇒ Boolean

Returns:

  • (Boolean)


395
396
397
398
399
# File 'lib/legion/llm/inference/executor/escalation.rb', line 395

def context_overflow_error?(err)
  err.is_a?(Legion::LLM::ContextOverflow) ||
    err.class.name.to_s.include?('ContextLength') ||
    CONTEXT_OVERFLOW_ERROR_PATTERNS.any? { |pat| pat.match?(err.message.to_s) }
end

#emit_error_audit(error, status:, provider: @resolved_provider, model: @resolved_model) ⇒ Object



270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
# File 'lib/legion/llm/inference/executor/escalation.rb', line 270

def emit_error_audit(error, status:, provider: @resolved_provider, model: @resolved_model)
  routing = { provider: provider, model: model }
  routing[:offering_id] = @resolved_offering_id if @resolved_offering_id
  routing[:offering_metadata] = @resolved_offering_metadata if @resolved_offering_metadata&.any?

  Legion::LLM::Audit.emit_prompt(
    request_id:      @request.id,
    conversation_id: @request.conversation_id,
    caller:          @request.caller,
    routing:         routing,
    tokens:          {},
    status:          status,
    error:           { class: error.class.name, message: error.message },
    tracing:         @tracing,
    timestamp:       Time.now,
    request_type:    'chat',
    messages:        @request.messages
  )
rescue StandardError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.emit_error_audit')
end

#emit_escalation_attempt_audit(provider:, model:, outcome:, duration_ms:, error: nil, attempt: 1) ⇒ Object



292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
# File 'lib/legion/llm/inference/executor/escalation.rb', line 292

def emit_escalation_attempt_audit(provider:, model:, outcome:, duration_ms:, error: nil, attempt: 1)
  routing = { provider: provider, model: model }
  routing[:offering_id] = @resolved_offering_id if @resolved_offering_id
  routing[:offering_metadata] = @resolved_offering_metadata if @resolved_offering_metadata&.any?

  tokens = {}
  if @extracted_tokens
    input_tokens  = @extracted_tokens.respond_to?(:input_tokens)  ? @extracted_tokens.input_tokens.to_i  : 0
    output_tokens = @extracted_tokens.respond_to?(:output_tokens) ? @extracted_tokens.output_tokens.to_i : 0
    thinking      = @extracted_tokens.respond_to?(:thinking_tokens) ? @extracted_tokens.thinking_tokens.to_i : 0
    tokens = { input_tokens: input_tokens, output_tokens: output_tokens, thinking_tokens: thinking }.compact
  end

  content = extract_response_content
  thinking_response = extract_thinking

  Legion::LLM::Audit.emit_prompt(
    request_id:            @request.id,
    conversation_id:       @request.conversation_id,
    caller:                @request.caller,
    routing:               routing,
    tokens:                tokens,
    status:                outcome == :success ? 'success' : 'error',
    provider_response_ref: "#{@request.id}:attempt:#{attempt}",
    latency_ms:            duration_ms,
    response_content:      content,
    response_thinking:     thinking_response,
    error:                 error ? { class: error.class.name, message: error.message } : nil,
    tracing:               @tracing,
    timestamp:             Time.now,
    request_type:          'chat',
    messages:              @request.messages
  )
rescue StandardError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.emit_escalation_attempt_audit')
end

#emit_escalation_attempt_metering(provider:, model:, duration_ms:, attempt: 1, status: 'success', error: nil, provider_submitted: true) ⇒ Object



329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
# File 'lib/legion/llm/inference/executor/escalation.rb', line 329

def emit_escalation_attempt_metering(provider:, model:, duration_ms:, attempt: 1, status: 'success',
                                     error: nil, provider_submitted: true)
  @extracted_tokens ||= extract_tokens
  input_tokens  = @extracted_tokens.respond_to?(:input_tokens)  ? @extracted_tokens.input_tokens.to_i  : 0
  output_tokens = @extracted_tokens.respond_to?(:output_tokens) ? @extracted_tokens.output_tokens.to_i : 0
  cost_usd = estimate_cost(input_tokens, output_tokens)

  event = Legion::LLM::Inference::Steps::Metering.build_event(
    provider:           provider,
    model_id:           model,
    offering_id:        @resolved_offering_id,
    offering_metadata:  @resolved_offering_metadata,
    tier:               @resolved_tier,
    request_type:       if @request.respond_to?(:request_type)
                          @request.request_type
                        else
                          'chat'
                        end,
    input_tokens:       input_tokens,
    output_tokens:      output_tokens,
    latency_ms:         duration_ms,
    wall_clock_ms:      duration_ms,
    cost_usd:           cost_usd,
    request_id:         @request.id,
    conversation_id:    @request.conversation_id,
    correlation_id:     @tracing&.dig(:correlation_id),
    caller:             @request.caller,
    identity:           metering_identity,
    billing:            @request.billing,
    routing_reason:     "escalation_attempt:#{attempt}",
    messages:           @request.messages,
    response_content:   extract_response_content,
    response_thinking:  extract_thinking,
    status:             status,
    error:              (error),
    provider_submitted: 
  )
  Legion::LLM::Inference::Steps::Metering.publish_or_spool(event)
rescue StandardError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.emit_escalation_attempt_metering')
end

#error_metadata(err) ⇒ Object



389
390
391
392
393
# File 'lib/legion/llm/inference/executor/escalation.rb', line 389

def (err)
  return nil unless err

  { class: err.class.name, message: err.message.to_s }
end

#escalation_attempt_hash(resolution, outcome:, failures:, duration_ms:) ⇒ Object



164
165
166
167
168
169
170
# File 'lib/legion/llm/inference/executor/escalation.rb', line 164

def escalation_attempt_hash(resolution, outcome:, failures:, duration_ms:)
  attempt = { model: resolution.model, provider: resolution.provider, tier: resolution.tier,
              outcome: outcome, failures: failures, duration_ms: duration_ms }
  attempt[:offering_id] = resolution.offering_id if resolution.offering_id
  attempt[:offering_metadata] = resolution. unless resolution..empty?
  attempt
end

#execute_provider_requestObject



182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'lib/legion/llm/inference/executor/escalation.rb', line 182

def execute_provider_request
  @timestamps[:provider_start] = Time.now
  @timeline.record(
    category: :provider, key: 'provider:request_sent',
    exchange_id: @exchange_id, direction: :outbound,
    detail: "calling #{@resolved_provider}",
    from: 'pipeline', to: "provider:#{@resolved_provider}"
  )

  raise Legion::LLM::ProviderError, "Native provider not registered: #{@resolved_provider}" unless fleet_dispatch? || use_native_dispatch?(@resolved_provider)

  execute_provider_request_native

  @timestamps[:provider_end] = Time.now
  record_provider_response
end

#execute_provider_request_nativeObject



199
200
201
202
203
204
# File 'lib/legion/llm/inference/executor/escalation.rb', line 199

def execute_provider_request_native
  result = execute_native_tool_loop
  (result.) if result.respond_to?(:metadata)
  @raw_response = result
  @tool_loop_messages = @last_tool_loop_messages if @last_tool_loop_messages
end

#execute_provider_request_streamObject



650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
# File 'lib/legion/llm/inference/executor/escalation.rb', line 650

def execute_provider_request_stream(&)
  @timestamps[:provider_start] = Time.now
  @timeline.record(
    category: :provider, key: 'provider:request_sent',
    exchange_id: @exchange_id, direction: :outbound,
    detail: "streaming from #{@resolved_provider}",
    from: 'pipeline', to: "provider:#{@resolved_provider}"
  )

  raise Legion::LLM::ProviderError, "Native provider not registered: #{@resolved_provider}" unless fleet_dispatch? || use_native_dispatch?(@resolved_provider)

  execute_provider_request_stream_native(&)

  @timestamps[:provider_end] = Time.now
  record_provider_response
end

#execute_provider_request_stream_nativeObject



667
668
669
670
671
# File 'lib/legion/llm/inference/executor/escalation.rb', line 667

def execute_provider_request_stream_native(&)
  result = execute_native_streaming_tool_loop(&)
  (result.) if result.respond_to?(:metadata)
  @raw_response = result
end

#extract_retry_after(error) ⇒ Object



264
265
266
267
268
# File 'lib/legion/llm/inference/executor/escalation.rb', line 264

def extract_retry_after(error)
  return nil unless error.respond_to?(:response) && error.response.is_a?(Hash)

  error.response[:headers]&.fetch('retry-after', nil)&.to_i
end

#internal_error?(err) ⇒ Boolean

G25 / B-H / PR #152 C5/C6: Internal errors (daemon NoMethodError/ArgumentError) come from shared daemon code — retrying on a different lane guarantees the same crash. Classified as terminal: raise immediately, never retry, never trip circuits, never push to tried_lanes.

Returns:

  • (Boolean)


418
419
420
# File 'lib/legion/llm/inference/executor/escalation.rb', line 418

def internal_error?(err)
  err.is_a?(::NoMethodError) || err.is_a?(::ArgumentError)
end

#pipeline_escalation_enabled?Boolean

Returns:

  • (Boolean)


172
173
174
175
# File 'lib/legion/llm/inference/executor/escalation.rb', line 172

def pipeline_escalation_enabled?
  esc = Legion::Settings[:llm].dig(:routing, :escalation) || {}
  esc[:enabled] == true && esc[:pipeline_enabled] == true
end

#pipeline_escalation_max_attemptsObject



177
178
179
180
# File 'lib/legion/llm/inference/executor/escalation.rb', line 177

def pipeline_escalation_max_attempts
  esc = Legion::Settings[:llm].dig(:routing, :escalation) || {}
  esc[:max_attempts] || 3
end

#record_escalation_failure(err, resolution, start_time, outcome:, operation:, handled: false) ⇒ Object



92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
# File 'lib/legion/llm/inference/executor/escalation.rb', line 92

def record_escalation_failure(err, resolution, start_time, outcome:, operation:, handled: false)
  @last_escalation_error = err
  duration_ms = ((Time.now - start_time) * 1000).round
  handle_exception(err, level: :warn, handled: handled, operation: operation,
                       provider: resolution.provider, model: resolution.model, duration_ms: duration_ms)
  if request_payload_error?(err)
    log.error "[llm][escalation] action=request_payload_error provider=#{resolution.provider} " \
              "instance=#{resolution.instance || 'default'} model=#{resolution.model} " \
              "error=#{err.message.to_s[0, 500]} daemon_side_payload_bug=true provider_health=false"
  elsif (err)
    # Account-scoped failure (credit balance, payment, quota). It is
    # deterministic — it will fail every call until the operator tops up —
    # so DEPRIORITIZE this instance immediately by tripping its per-instance
    # circuit, to stop wasting calls on an account that cannot work. The
    # circuit is per (provider, instance), so healthy sibling instances and
    # the provider globally are unaffected, and its cooldown -> half_open
    # re-probe auto-recovers the instance once credits return. Do NOT
    # deny_model — the model is fine on other instances/accounts.
    log.warn "[llm][escalation] action=account_scoped_error provider=#{resolution.provider} " \
             "instance=#{resolution.instance || 'default'} model=#{resolution.model} " \
             "error=#{err.message.to_s[0, 300]} deprioritized=true model_denied=false"
    Legion::LLM::Router.health_tracker.trip_circuit( # allowlist:write-side
      provider: resolution.provider, instance: resolution.instance, reason: err.message
    )
  elsif authentication_error?(err) || config_error?(err)
    Legion::LLM::Router.health_tracker.deny_model( # allowlist:write-side
      provider: resolution.provider,
      model:    resolution.model,
      instance: resolution.instance,
      reason:   err.message
    )
    Legion::LLM::Router.health_tracker.trip_circuit( # allowlist:write-side
      provider: resolution.provider,
      instance: resolution.instance,
      reason:   err.message
    )
  elsif !context_overflow_error?(err)
    Legion::LLM::Router.health_tracker.report(provider: resolution.provider, instance: resolution.instance, # allowlist:write-side
                                              offering_id: resolution.offering_id,
                                              signal: :error, value: 1,
                                              metadata: { reason: err.class.name, message: err.message.to_s[0, 500],
                                                          model: resolution.model })
  end
  @escalation_history << escalation_attempt_hash(
    resolution,
    outcome:     outcome,
    failures:    [err.class.name],
    duration_ms: duration_ms
  )
  @timeline.record(
    category: :provider, key: 'escalation:attempt', direction: :internal,
    detail: "attempt #{@escalation_history.size}: #{resolution.provider}:#{resolution.model} => #{outcome}",
    from: 'pipeline', to: "provider:#{resolution.provider}"
  )
  emit_escalation_attempt_metering(
    provider:    resolution.provider,
    model:       resolution.model,
    duration_ms: duration_ms,
    attempt:     @escalation_history.size,
    status:      'error',
    error:       err
  )
  emit_escalation_attempt_audit(
    provider:    resolution.provider,
    model:       resolution.model,
    outcome:     outcome,
    duration_ms: duration_ms,
    error:       err,
    attempt:     @escalation_history.size
  )
end

#record_provider_responseObject



206
207
208
209
210
211
212
213
214
215
216
217
# File 'lib/legion/llm/inference/executor/escalation.rb', line 206

def record_provider_response
  duration_ms = ((@timestamps[:provider_end] - @timestamps[:provider_start]) * 1000).to_i
  report_provider_health(:success, duration_ms) if @resolved_offering_id
  log.debug("[pipeline][provider] action=response_received provider=#{@resolved_provider} model=#{@resolved_model} duration_ms=#{duration_ms}")
  @timeline.record(
    category: :provider, key: 'provider:response_received',
    exchange_id: @exchange_id, direction: :inbound,
    detail: 'response received',
    from: "provider:#{@resolved_provider}", to: 'pipeline',
    duration_ms: duration_ms
  )
end

#report_provider_failure(error, provider: @resolved_provider, instance: @resolved_instance, model: @resolved_model, offering_id: @resolved_offering_id, status: 'provider_error') ⇒ Object



219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
# File 'lib/legion/llm/inference/executor/escalation.rb', line 219

def report_provider_failure(error, provider: @resolved_provider, instance: @resolved_instance,
                            model: @resolved_model, offering_id: @resolved_offering_id,
                            status: 'provider_error')
  emit_error_audit(error, status: status, provider: provider, model: model)
  return if request_payload_error?(error)
  return if context_overflow_error?(error)
  return if client_stream_error?(error)

  if authentication_error?(error) || config_error?(error)
    Legion::LLM::Router.health_tracker.deny_model( # allowlist:write-side
      provider: provider,
      model:    model,
      instance: instance,
      reason:   error.message
    )
    Legion::LLM::Router.health_tracker.trip_circuit( # allowlist:write-side
      provider: provider,
      instance: instance,
      reason:   error.message
    )
    return
  end

  Legion::LLM::Router.health_tracker.report( # allowlist:write-side
    provider: provider, instance: instance,
    offering_id: offering_id, signal: :error, value: 1,
    metadata: { reason: error.class.name, message: error.message.to_s[0, 500], model: model }
  )
rescue StandardError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.report_provider_failure')
end

#report_provider_health(signal, duration_ms, metadata: {}) ⇒ Object



251
252
253
254
255
256
257
258
259
260
261
262
# File 'lib/legion/llm/inference/executor/escalation.rb', line 251

def report_provider_health(signal, duration_ms, metadata: {})
  return unless defined?(Legion::LLM::Router) && Legion::LLM::Router.routing_enabled?

  Legion::LLM::Router.health_tracker.report(provider: @resolved_provider, instance: @resolved_instance, # allowlist:write-side
                                            offering_id: @resolved_offering_id,
                                            signal: signal, value: 1, metadata: .merge(duration_ms: duration_ms))
  Legion::LLM::Router.health_tracker.report(provider: @resolved_provider, instance: @resolved_instance, # allowlist:write-side
                                            offering_id: @resolved_offering_id,
                                            signal: :latency, value: duration_ms, metadata: {})
rescue StandardError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.report_provider_health')
end

#request_payload_error?(err) ⇒ Boolean

Returns:

  • (Boolean)


371
372
373
374
375
# File 'lib/legion/llm/inference/executor/escalation.rb', line 371

def request_payload_error?(err)
  name = err.class.name.to_s
  msg = err.message.to_s
  REQUEST_PAYLOAD_ERROR_PATTERNS.any? { |pat| pat.match?(name) || pat.match?(msg) }
end

#run_provider_call_singleObject



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/legion/llm/inference/executor/escalation.rb', line 55

def run_provider_call_single
  execute_provider_request
rescue Legion::LLM::AuthError, Faraday::UnauthorizedError, Faraday::ForbiddenError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.auth',
                   provider: @resolved_provider, model: @resolved_model)
  report_provider_failure(e, status: 'auth_failed')
  raise e.is_a?(Legion::LLM::AuthError) ? e : Legion::LLM::AuthError.new(e.message)
rescue Legion::LLM::ContextOverflow => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.context_overflow',
                   provider: @resolved_provider, model: @resolved_model)
  emit_error_audit(e, status: 'context_overflow')
  raise
rescue Legion::LLM::RateLimitError, Faraday::TooManyRequestsError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.rate_limit',
                   provider: @resolved_provider, model: @resolved_model)
  report_provider_failure(e, status: 'rate_limited')
  raise e.is_a?(Legion::LLM::RateLimitError) ? e : Legion::LLM::RateLimitError.new(e.message, retry_after: extract_retry_after(e))
rescue Legion::LLM::ProviderDown, Faraday::ConnectionFailed, Faraday::TimeoutError, Faraday::SSLError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.provider_down',
                   provider: @resolved_provider, model: @resolved_model)
  report_provider_failure(e, status: 'provider_down')
  raise e.is_a?(Legion::LLM::ProviderDown) ? e : Legion::LLM::ProviderDown.new(e.message)
rescue Legion::LLM::ProviderError, Faraday::ServerError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call.provider_error',
                   provider: @resolved_provider, model: @resolved_model)
  report_provider_failure(e, status: 'provider_error')
  raise e.is_a?(Legion::LLM::ProviderError) ? e : Legion::LLM::ProviderError.new(e.message)
end

#run_provider_call_with_attempts(routing_payload:, assembler: nil, stream_block: nil) ⇒ Object

G26 / D-C: The bounded while remaining.positive? loop. Replaces the @escalation_chain iteration. Bounded by construction — remaining only decreases. No loop do, no retry, no redo inside this block.

G27 / D-G: Dual-error semantics tracked by attempt_idx == 0:

NoLaneAvailable   — request_lane returned nil on first try (attempt_idx == 0)
EscalationExhausted — tried at least one lane (attempt_idx >= 1) then ran out

First-attempt seeding: if step_routing already resolved a provider/model, we look up the corresponding Inventory lane for that (provider, instance, model) to use as the first lane. This preserves step_routing’s resolution while keeping the loop-based retry mechanism. N×N: single canonical escalation loop — no provider-specific branches. The API namespace translator converts all client formats to canonical form; this loop dispatches only through execute_provider_request / execute_provider_request_stream.



481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
# File 'lib/legion/llm/inference/executor/escalation.rb', line 481

def run_provider_call_with_attempts(routing_payload:, assembler: nil, stream_block: nil)
  remaining      = routing_payload[:max_attempts] || Legion::Settings[:llm][:routing][:max_attempts]
  total_attempts = remaining
  attempt_idx    = 0
  last_error     = nil

  log.debug('[llm][executor] action=run_provider_call_with_attempts ' \
            "max_attempts=#{remaining} provider=#{@resolved_provider} model=#{@resolved_model}")

  while remaining.positive?
    lane = select_next_lane(routing_payload: routing_payload, attempt_idx: attempt_idx)

    if lane.nil?
      if attempt_idx.zero?
        err = Legion::LLM::Errors::NoLaneAvailable.new(
          filters: routing_payload.slice(:tiers, :providers, :instances, :models,
                                         :capabilities, :thinking, :privacy, :estimated_context)
        )
        log.warn("[llm][executor] action=no_lane_available filters=#{err.filters.inspect}")
        emit_error_audit(err, status: 'no_lane_available')
      else
        err = Legion::LLM::Errors::EscalationExhausted.new(
          attempts:    attempt_idx,
          tried_lanes: routing_payload[:tried_lanes],
          last_error:  last_error
        )
        log.warn("[llm][executor] action=escalation_exhausted_mid_loop attempts=#{attempt_idx} " \
                 "tried=#{routing_payload[:tried_lanes].size}")
        emit_error_audit(err, status: 'escalation_exhausted')
      end
      raise err
    end

    remaining   -= 1
    attempt_idx += 1

    @resolved_provider         = lane[:provider_family]
    @resolved_instance         = lane[:instance_id]
    @resolved_model            = lane[:model]
    @resolved_tier             = lane[:tier]
    @resolved_offering_id      = lane[:id]
    @resolved_offering_metadata = (
      lane.slice(:canonical_model_alias, :limits, :cost, :capabilities).compact
    )

    log.info("[llm][executor] action=dispatch_attempt attempt=#{attempt_idx}/#{total_attempts} " \
             "lane=#{lane[:id]} provider=#{@resolved_provider} model=#{@resolved_model}")

    assembler&.begin_dispatch_on(lane: lane) if assembler.respond_to?(:begin_dispatch_on)

    start_time = Time.now
    begin
      # N×N: single canonical dispatch path — no provider-specific branches.
      # The API namespace translator converts all client formats (OpenAI Chat,
      # OpenAI Responses, Anthropic Messages) to canonical before the executor
      # receives the request. The lex-llm-* provider adapter handles wire format.
      if stream_block
        execute_provider_request_stream(&stream_block)
      else
        execute_provider_request
      end
      duration_ms = ((Time.now - start_time) * 1000).round
      report_provider_health(:success, duration_ms) if @resolved_offering_id
      @timeline.record(
        category: :provider, key: 'escalation:attempt', direction: :internal,
        detail: "attempt #{attempt_idx}: #{@resolved_provider}:#{@resolved_model} => success",
        from: 'pipeline', to: "provider:#{@resolved_provider}"
      )
      emit_escalation_attempt_metering(provider: @resolved_provider, model: @resolved_model,
                                       duration_ms: duration_ms, attempt: attempt_idx)
      emit_escalation_attempt_audit(provider: @resolved_provider, model: @resolved_model,
                                    outcome: :success, duration_ms: duration_ms,
                                    attempt: attempt_idx)
      return
    rescue StandardError => e
      last_error  = e
      duration_ms = ((Time.now - start_time) * 1000).round
      record_escalation_failure(e,
                                Legion::LLM::Router::Resolution.new(
                                  tier: @resolved_tier, provider: @resolved_provider,
                                  instance: @resolved_instance, model: @resolved_model,
                                  offering_id: @resolved_offering_id
                                ),
                                start_time,
                                outcome:   :error,
                                operation: 'llm.pipeline.attempts_loop',
                                handled:   true)
      assembler&.provider_failover_pending!(from: lane) if assembler.respond_to?(:provider_failover_pending!)

      classify_and_accumulate_exclusions(error: e, lane: lane, payload: routing_payload)
    end
  end

  err = Legion::LLM::Errors::EscalationExhausted.new(
    attempts:    total_attempts,
    tried_lanes: routing_payload[:tried_lanes],
    last_error:  last_error
  )
  log.warn("[llm][executor] action=escalation_exhausted_loop_complete attempts=#{total_attempts} " \
           "tried=#{routing_payload[:tried_lanes].size}")
  emit_error_audit(err, status: 'escalation_exhausted')
  raise err
end

#select_next_lane(routing_payload:, attempt_idx:) ⇒ Object

Select the next lane to dispatch to. On the first attempt (attempt_idx == 0): prefer the Inventory lane matching the provider/model already resolved by step_routing (preserves routing phase resolution). Falls through to request_lane if no specific lane exists. On subsequent attempts: delegate entirely to request_lane (the new SSOT, per G26/SSOT design).



590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
# File 'lib/legion/llm/inference/executor/escalation.rb', line 590

def select_next_lane(routing_payload:, attempt_idx:, **)
  if attempt_idx.zero? && @resolved_provider && @resolved_model
    # First attempt: look up the Inventory lane for the routing-resolved (provider, instance, model).
    # If found and not already tried/blocked, use it. Otherwise fall through to request_lane.
    provider_lanes = Legion::LLM::Inventory.lanes_for(
      provider: @resolved_provider, instance: @resolved_instance
    )
    preferred = provider_lanes.find do |l|
      l[:model] == @resolved_model &&
        !routing_payload[:tried_lanes].include?(l[:id]) &&
        l[:lane_weight].to_i.positive?
    end
    return preferred if preferred
  end
  Legion::LLM::Router.request_lane(**routing_payload)
rescue StandardError => e
  handle_exception(e, level: :warn, handled: true, operation: 'llm.pipeline.select_next_lane')
  Legion::LLM::Router.request_lane(**routing_payload)
end

#step_provider_callObject



13
14
15
16
17
18
19
20
21
# File 'lib/legion/llm/inference/executor/escalation.rb', line 13

def step_provider_call
  escalation = pipeline_escalation_enabled?
  log.debug "[llm][executor] action=step_provider_call provider=#{@resolved_provider} model=#{@resolved_model} escalation=#{escalation}"
  if escalation
    run_provider_call_with_attempts(routing_payload: build_routing_payload_from_resolved)
  else
    run_provider_call_single
  end
end

#step_provider_call_stream(&block) ⇒ Object



610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
# File 'lib/legion/llm/inference/executor/escalation.rb', line 610

def step_provider_call_stream(&block)
  if pipeline_escalation_enabled?
    run_provider_call_with_attempts(routing_payload: build_routing_payload_from_resolved,
                                    stream_block:    block)
    return
  end

  execute_provider_request_stream(&block)
rescue Legion::LLM::AuthError, Faraday::UnauthorizedError, Faraday::ForbiddenError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.auth',
                   provider: @resolved_provider, model: @resolved_model)
  report_provider_failure(e, status: 'auth_failed')
  raise e.is_a?(Legion::LLM::AuthError) ? e : Legion::LLM::AuthError.new(e.message)
rescue Legion::LLM::ContextOverflow => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.context_overflow',
                   provider: @resolved_provider, model: @resolved_model)
  emit_error_audit(e, status: 'context_overflow')
  raise
rescue Legion::LLM::RateLimitError, Faraday::TooManyRequestsError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.rate_limit',
                   provider: @resolved_provider, model: @resolved_model)
  report_provider_failure(e, status: 'rate_limited')
  raise e.is_a?(Legion::LLM::RateLimitError) ? e : Legion::LLM::RateLimitError.new(e.message, retry_after: extract_retry_after(e))
rescue Legion::LLM::ProviderDown, Faraday::ConnectionFailed, Faraday::TimeoutError, Faraday::SSLError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.provider_down',
                   provider: @resolved_provider, model: @resolved_model)
  report_provider_failure(e, status: 'provider_down')
  raise e.is_a?(Legion::LLM::ProviderDown) ? e : Legion::LLM::ProviderDown.new(e.message)
rescue Legion::LLM::ProviderError, Faraday::ServerError => e
  handle_exception(e, level: :warn, operation: 'llm.pipeline.provider_call_stream.provider_error',
                   provider: @resolved_provider, model: @resolved_model)
  report_provider_failure(e, status: 'provider_error')
  raise e.is_a?(Legion::LLM::ProviderError) ? e : Legion::LLM::ProviderError.new(e.message)
rescue StandardError => e
  raise if client_stream_error?(e)

  report_provider_failure(e, status: 'provider_error')
  raise
end