Class: Rubino::Agent::ModelCallRunner

Inherits:

Object

Object
Rubino::Agent::ModelCallRunner

show all

Defined in:: lib/rubino/agent/model_call_runner.rb

Overview

The INNER retry loop of the conversation loop — a faithful port of the reference ‘while retry_count < max_retries` block (the invalid-response path and the error path).

ONE responsibility: issue a single model call against the LLM boundary and, when it comes back unusable or raises a transient error, retry it with backoff until it succeeds or the retry budget is exhausted. It OWNS the ‘retry_count`. The outer Loop hands it a built LLM::Request and gets back a validated AdapterResponse (or an exception).

Control flow per attempt:

call boundary
  → raises?  → ErrorClassifier.classify → retryable & budget left?
                 yes: backoff (error-path preset, honour Retry-After), retry
                 no : re-raise (permanent / budget exhausted)
  → returns? → ResponseValidator#valid?
                 valid          : return it
                 :empty_response: backoff (invalid-response preset), retry
                                  up to empty_response_max_retries, then
                                  raise EmptyModelResponseError
                 other invalid  : return as-is (nil / interrupted — the
                                  caller maps these to StreamInterruptedError;
                                  not the runner's job to retry)

TWO backoff sites, two budgets, exactly as the reference:

* invalid/empty response  → BackoffPolicy::INVALID_RESPONSE (5s/120s),
                            empty_response_max_retries (small, default 2)
* transient API error     → BackoffPolicy::ERROR_PATH (2s/60s),
                            agent.api_max_retries

The degenerate/empty-response path delegates to DegenerateResponseRecovery (Slice 5) — the seven-rung ladder (partial-stream → prior-turn → post-tool nudge → thinking-only prefill ×2 → empty retry ×3 → fallback seam →terminal raise) ported from the reference conversation loop. See #apply_recovery!.

NOT in scope here (left as clear seams):

* eager fallback on an invalid response and fallback-on-max-retries
  (the reference _try_activate_fallback, which RESETS
  retry_count to 0) is Slice 7 — see the `# SLICE-7` seam below. The
  counter is structured so a future fallback can reset it.

Instance Method Summary collapse

#call!(request, iteration: nil) ⇒ Object

Run the inner retry loop for one model call.
#initialize(llm:, config:, ui:, event_bus:, cancel_token: nil, fallback_chain: nil, validator: ResponseValidator.new) ⇒ ModelCallRunner constructor

A new instance of ModelCallRunner.

Constructor Details

#initialize(llm:, config:, ui:, event_bus:, cancel_token: nil, fallback_chain: nil, validator: ResponseValidator.new) ⇒ `ModelCallRunner`

Returns a new instance of ModelCallRunner.

# File 'lib/rubino/agent/model_call_runner.rb', line 47

def initialize(llm:, config:, ui:, event_bus:, cancel_token: nil,
               fallback_chain: nil, validator: ResponseValidator.new)
  @llm = llm
  # SLICE-7: the provider/model fallback chain. When present, the live
  # adapter for each attempt is the chain's CURRENT adapter (so a rotation
  # takes effect on the very next call), and a fallback-worthy failure
  # rotates it. Nil in tests/one-shot callers → behave as a fixed @llm.
  @fallback_chain = fallback_chain
  @config       = config
  @ui           = ui
  @event_bus    = event_bus
  @cancel_token = cancel_token
  @validator    = validator
end

Instance Method Details

#call!(request, iteration: nil) ⇒ `Object`

Run the inner retry loop for one model call. ‘request` is a built LLM::Request; an optional block forwards stream chunks straight through to the boundary (matching `@llm.call(request) { |chunk| }`). Returns a validated AdapterResponse, or raises EmptyModelResponseError / the classified API error.

‘iteration` is purely for the warning/telemetry text (which loop turn this call belongs to); it has no control-flow role.

# File 'lib/rubino/agent/model_call_runner.rb', line 70

def call!(request, iteration: nil, &)
  # Error-path budget — distinct from the empty/degenerate budgets, which
  # the recovery ladder owns (see #recovery). Kept here so a transient API
  # error can't bleed into the empty-retry count.
  error_attempts = 0

  # The degenerate-response recovery ladder (Slice 5). Fresh per call! so
  # its per-turn counters (prefill ≤2, empty ≤3) reset exactly where the
  # reference zeroes them on a successful content turn.
  recovery = recovery_for(iteration)

  # The live request we (re)issue. Rungs 3/4 mutate it: a nudge appends to
  # request.messages in place; a prefill re-issues with the seed attached.
  current = request
  # Visible text streamed to the user this call — fuels rung 1
  # (partial-stream recovery). The caller's block still sees every chunk.
  streamed = +""
  wrapped  = capture_streamed(streamed, &)

  # :recovered is thrown by the ladder's rung-1/2 ":use" directive — the
  # recovered final content, wrapped as a synthetic text response.
  catch(:recovered) do
    loop do
      @cancel_token&.check!

      begin
        response = active_llm.call(current, &wrapped)
      rescue Rubino::Interrupted
        # User cancellation propagates immediately — never classified, never
        # retried (the reference treats interrupt as terminal at every backoff site).
        raise
      rescue StandardError => e
        error_attempts = handle_error!(e, error_attempts, iteration)
        next
      end

      # User cancellation that arrived MID-STREAM may not surface as a raise:
      # once a chunk has flowed the adapter RETURNS the buffered (possibly
      # empty) partial instead of raising, so a Ctrl+C right as the stream
      # drained lands here as an "empty" response. Re-check the cancel token
      # BEFORE validation so the interrupt is terminal — otherwise the empty
      # partial is classified :empty_response and the recovery ladder prints
      # a spurious "Empty response — retrying (1/2)" before the cancel is
      # acknowledged (D4). The interrupt is the correct terminal outcome.
      @cancel_token&.check!

      ok, reason = @validator.valid?(response)

      # Structurally invalid AND not an empty turn (nil / interrupted
      # truncated-stream partial). SLICE-7 eager fallback:
      # an invalid/malformed response is a common rate-limit symptom, so
      # rotate to the next provider immediately rather than surfacing it as
      # a failed turn. On a switch, reset the per-call counters and retry on
      # the new adapter; otherwise hand it back untouched — the Loop maps it
      # to StreamInterruptedError. Not the recovery ladder's job.
      if !ok && reason != :empty_response
        if activate_fallback!(iteration)
          error_attempts = 0
          recovery = recovery_for(iteration)
          streamed.clear # partial belongs to the failed provider, not the new one
          next
        end
        throw(:recovered, response)
      end

      # Usable iff structurally valid AND not degenerate (thinking-only /
      # blank-after-think). A degenerate response passes #valid? (its content
      # is non-empty <think> text) but carries no real answer — route it, and
      # any 200-OK-but-empty turn, through the ladder.
      throw(:recovered, response) if ok && !@validator.degenerate?(response)

      current, switched = apply_recovery!(recovery, response, current, streamed, iteration)
      # SLICE-7 rung 6: the ladder rotated to a fallback. Reset
      # the per-call counters (fresh recovery, zeroed error budget) and retry
      # on the new adapter — the reference zeroes _empty_content_retries here.
      next unless switched

      error_attempts = 0
      recovery = recovery_for(iteration)
      streamed.clear
    end
  end
end