Class: Rubino::Agent::Loop

Inherits:

Object

Object
Rubino::Agent::Loop

show all

Defined in:: lib/rubino/agent/loop.rb

Overview

The core agent loop that handles LLM calls and tool execution cycles. Runs until the LLM produces a final text response or budget is exhausted.

Constant Summary collapse

MAX_ITERATIONS_SUMMARY_NUDGE = Nudge issued on the final, toolless model call when the iteration/budget ceiling is hit. Mirrors the reference handle_max_iterations summary request — ask the model to wrap up in prose instead of ending the turn with nothing.

"You've reached the maximum number of tool-calling iterations allowed. " \
"Please provide a final response summarizing what you've found and " \
"accomplished so far, without calling any more tools."

NOTICES_PREAMBLE = Framing for turn-start background notices (#148): tells the model the notices are secondary to the user message that follows them.

"[background notices — acknowledge briefly; the user's message AFTER " \
"these notices is the instruction to act on]"

Instance Method Summary collapse

#initialize(session:, llm_adapter:, tool_executor:, message_store:, budget:, ui:, event_bus:, config:, cancel_token: nil, initial_image_paths: [], input_queue: nil) ⇒ Loop constructor

A new instance of Loop.
#run(messages:, tools:) ⇒ Object

Runs the agent loop, returning the final assistant response content.

Constructor Details

#initialize(session:, llm_adapter:, tool_executor:, message_store:, budget:, ui:, event_bus:, config:, cancel_token: nil, initial_image_paths: [], input_queue: nil) ⇒ `Loop`

Returns a new instance of Loop.

# File 'lib/rubino/agent/loop.rb', line 23

def initialize(session:, llm_adapter:, tool_executor:, message_store:,
               budget:, ui:, event_bus:, config:, cancel_token: nil,
               initial_image_paths: [], input_queue: nil)
  @session             = session
  @llm                 = llm_adapter
  @tool_executor       = tool_executor
  @message_store       = message_store
  @budget              = budget
  @ui                  = ui
  @event_bus           = event_bus
  @config              = config
  @cancel_token        = cancel_token
  # Optional steering hand-off (Interaction::InputQueue). When present,
  # text the user typed mid-turn is drained at the top of each loop
  # iteration and injected as a user message. Nil for the API/server path
  # and nested subagent runs — they get no injection and behave exactly
  # as before.
  @input_queue         = input_queue
  # Consumed once on the first iteration. After the first model call
  # subsequent iterations are tool-result follow-ups — no user input,
  # nothing to re-attach.
  @pending_image_paths = Array(initial_image_paths)
  # Provider/model fallback chain (Slice 7). Primary at index 0; rotates to
  # the next configured backend when the primary keeps failing, and is
  # restored at the top of each turn (#run). With no agent.fallback_models
  # configured the chain holds only the primary and is an inert pass-through,
  # so single-provider setups behave exactly as before.
  @fallback_chain      = FallbackChain.new(
    primary_adapter: llm_adapter,
    config: config,
    ui: ui,
    event_bus: event_bus,
    tool_executor: tool_executor,
    cancel_token: cancel_token
  )
  # Owns the inner retry loop (call → validate → classify → backoff →
  # return/raise). The Loop builds each LLM::Request and hands it to the
  # runner, which returns a validated response or raises (empty-exhausted →
  # EmptyModelResponseError; transient-exhausted/permanent → the classified
  # error). The error-classification + backoff retries that used to live in
  # the adapter's with_retries now live here — single owner, no double-retry.
  # The runner issues calls against the chain's CURRENT adapter and can
  # rotate it via the chain on a fallback-worthy failure.
  @model_call_runner = ModelCallRunner.new(
    llm: llm_adapter,
    fallback_chain: @fallback_chain,
    config: config,
    ui: ui,
    event_bus: event_bus,
    cancel_token: cancel_token
  )
  # Single count + persist sink for tool results. The executor invokes it
  # for every tool on BOTH paths: the streaming path (ruby_llm runs the
  # tool mid-stream via ToolBridge → ToolExecutor#execute, never returning
  # through #execute_tool_calls) and the non-streaming path. Registered
  # here rather than passed at construction because the executor is built
  # before the Loop (the adapter/ToolBridge share the same executor).
  @tool_executor.on_result = method(:handle_tool_result) if @tool_executor.respond_to?(:on_result=)
end

Instance Method Details

#run(messages:, tools:) ⇒ `Object`

Runs the agent loop, returning the final assistant response content.

# File 'lib/rubino/agent/loop.rb', line 84

def run(messages:, tools:) # rubocop:disable Metrics/PerceivedComplexity,Metrics/CyclomaticComplexity
  # Stash the resolved toolset so #streaming? can decide, per run, whether
  # this turn might block on a human (clarify/approval). When it might, we
  # run NON-STREAMING so the LLM HTTP request completes and CLOSES before
  # any tool fires — leaving no upstream socket held open during the gate
  # wait (the wait can now be effectively unbounded; see ApprovalGate).
  @turn_tools     = Array(tools)
  iteration       = 0
  turn_started_at = monotonic_now

  # Reflect-guard against fabricated "done" (the #1 trust-killer): a
  # toolless turn whose prose claims an action it never carried out. Built
  # once per turn from the toolset actually on offer; counts its own
  # corrective re-prompts so it can stop honestly at the cap.
  @action_guard       = ActionClaimGuard.new(exposed_tool_names: @turn_tools.map { |t| tool_name_of(t) })
  @reflection_count   = 0
  # The user request driving this turn, captured from the OPENING transcript
  # (before any guard reflection note is appended) — the guard consults it
  # to skip challenging a NO-ACTION (plan/explain/"don't run tools") turn the
  # user explicitly asked for (#353a).
  @turn_user_request  = originating_user_request(messages)

  # If a previous turn rotated to a fallback, restore the primary backend
  # so this turn gets a fresh attempt with the preferred model
  # (conversation_loop.py:427). No-op when we never left the primary.
  @fallback_chain.restore_primary!

  # Mutated by the ToolExecutor's on_result sink (see #handle_tool_result),
  # which fires for EVERY tool regardless of streaming mode — including the
  # streaming path where ruby_llm runs the tool mid-stream via ToolBridge
  # and never returns through #execute_tool_calls below. Instance vars (not
  # locals) so the sink closure can update them.
  @tool_count     = 0
  @denied_count   = 0
  # Of the tools that RAN, how many were MUTATING (edit/write/patch). Lets
  # the pessimistic-summary reconciliation (#381) say "N tool calls (M edits
  # — review uncommitted changes)" so a developer is pointed at real,
  # possibly-uncommitted disk changes when the model claims it did nothing.
  @edit_count     = 0
  # Round-trips ruby_llm ran INSIDE a single streaming ask() this turn
  # (#355a). ruby_llm drives the whole model↔tool loop within one
  # chat.ask, so the outer `iteration` counter above stays at 1 for the
  # entire streaming turn and never re-consults the budget between the
  # intermediate round-trips. The adapter calls #note_stream_round_trip
  # once per round-trip (via on_round_trip), and #stream_budget_exhausted?
  # reads this count so ToolBridge can Halt the in-ask loop once the
  # iteration/time budget is spent. Reset per turn.
  @stream_round_trips = 0
  # Accumulates the content streamed to the screen this turn so that an
  # interrupt mid-stream can persist EXACTLY what the user saw, marked
  # interrupted (#338b). Reset per turn — a one-shot CancelToken plus a
  # fresh buffer means a stale partial can never attach to a later turn.
  @interrupt_partial = +""
  # True once any denial this turn was a headless fail-closed block ("needs
  # approval but no interactive session", #260) — lets the binding guard
  # point at `--yolo` (F2) instead of "approve it" in the honest message.
  @noninteractive_block = false
  token_total = 0

  loop do
    iteration += 1
    @cancel_token&.check!

    # Mid-turn steering boundary. SAFE point: the cancel check has passed
    # and any prior assistant(tool_use) + tool(result) messages from the
    # previous iteration are already appended, so adding a USER message
    # here can never split a tool_use from its results (no orphan pair on
    # strict providers). On iteration 1 the initial user input is already
    # the user turn, so only parked background NOTICES fold in (#13);
    # typed lines stay queued for their own turns.
    inject_steered_input(messages, iteration)

    unless @budget.can_continue?(iteration)
      @ui.warning("Iteration budget exhausted (#{iteration} turns)")
      outcome = handle_budget_exhausted(messages, iteration,
                                        turn_started_at, token_total)
      # :continue → the user (interactively) granted more budget; the
      # iteration cap was raised and we re-enter the SAME turn with full
      # context (no re-summary, no truncation). Anything else is the final
      # assistant text (force-summary / abort).
      next if outcome == :continue

      return outcome
    end

    @event_bus.emit(Interaction::Events::MODEL_CALL_STARTED, iteration: iteration)
    # Show a transient "thinking…" indicator during TTFB. The UI erases
    # it the moment the first chunk lands (any type). Skipped in
    # non-streaming mode — the response arrives in one shot, indicator
    # would flash uselessly.
    @ui.thinking_started if streaming?
    begin
      response = call_model(messages, tools, iteration)
    rescue Rubino::Interrupted
      # The streaming callback (or the per-iteration check above)
      # observed cancellation. Persist EXACTLY the partial that was shown
      # on screen — flagged interrupted in metadata — so storage matches
      # the screen and the transcript stays truthful & resumable (#338b).
      # Without this, the on-screen `⎿ interrupted` partial was absent from
      # the messages table and resume/compaction/memory diverged from what
      # the user saw. Then close any open stream box (commits the partial
      # answer streamed so far) and bail out — the standardized
      # `⎿ interrupted` marker is appended once by the Runner's rescue,
      # right after this kept partial. The upstream stream is already
      # cancelled: raising out of the per-chunk callback unwinds Faraday's
      # net-http read loop, which closes the socket (no drain) — verified
      # against ruby_llm 1.x's Streaming#stream_response, where the block
      # we raise from runs inside the on_data handler.
      persist_interrupted_partial
      @ui.stream_end if streaming?
      raise
    end
    @event_bus.emit(Interaction::Events::MODEL_CALL_FINISHED,
                    tokens: response.total_tokens,
                    input_tokens: response.input_tokens,
                    output_tokens: response.output_tokens,
                    stop_reason: response.stop_reason,
                    model_id: response.model_id,
                    has_tool_calls: response.has_tool_calls?)

    token_total += response.total_tokens.to_i

    # #355a: the streaming round-trip loop was cut short mid-flight because
    # this turn's iteration/time budget was spent (ToolBridge returned
    # Tool::Halt). ruby_llm already added a valid trailing tool message, so
    # the history is well-formed — hand off to the same budget-exhausted
    # summary the outer-loop cap uses. `iteration` is still 1 for a
    # streaming turn, so pass the round-trip count as the iteration reached.
    if response.halted?
      outcome = handle_budget_exhausted(messages, @stream_round_trips,
                                        turn_started_at, token_total)
      # :continue → budget extended; the next ask() picks up the
      # well-formed post-Halt history (ruby_llm already appended the
      # trailing tool message) and resumes the in-ask round-trip loop
      # against the now-larger budget. No tool_bridge change needed.
      next if outcome == :continue

      return outcome
    end

    if response.interrupted?
      # The upstream stream was cut before a clean completion (no
      # finish_reason / [DONE]); `response` carries only a buffered partial
      # with no tool call. Returning it would end the run as "completed"
      # with truncated/empty output — the silent-completion bug. Persist
      # whatever streamed so the transcript keeps it, close the stream box,
      # then raise: Lifecycle maps this to INTERACTION_FAILED → run.failed,
      # the same path every other turn error already takes.
      persist_assistant_message(response) unless response.content.to_s.empty?
      finalize_stream(response)
      emit_turn_summary(turn_started_at, token_total)
      raise Rubino::StreamInterruptedError,
            "stream ended before completion after " \
            "#{response.content.to_s.bytesize} buffered byte(s) with no finish signal — " \
            "the model did not finish (run marked failed, not completed). " \
            "Often caused by a very large context pushing time-to-first-token past the " \
            "provider's stream idle timeout."
    end

    if response.text_only?
      # Fabricated-"done" gate: the structured tool-call channel is the
      # ONLY thing that advances state. If this toolless turn's prose
      # asserts an action against a tool we expose (or claims a `cd` we
      # cannot do), DON'T let that reach the user as a completed answer.
      guard = guard_text_only_turn(response, messages)
      # A corrective user message was appended; loop again so the model
      # either calls the tool or owns up. iteration/token_total carry on.
      next if guard == :reflected

      # cd: the claim can never be true, so we replaced the fabricated
      # final answer with an honest message (how to actually change the
      # workspace). Surface that, not the model's no-op claim.
      final = guard.is_a?(String) ? guard : response.content

      persist_final_text(response, final)
      finalize_stream_text(response, final)
      emit_turn_summary(turn_started_at, token_total)

      # The ANSWER returned to the caller is the LAST text block only
      # (#core-F1): on a streaming turn whose final round-trip used a tool,
      # `response.content` is every text block of the turn concatenated
      # (pre-tool narration + post-tool answer, no delimiter), which a
      # headless `OUT=$(rubino prompt …)` would capture as one run-on string.
      # The full text was already streamed live and persisted via #final
      # above (transcript/render keep the narration, #261); the value we
      # HAND BACK is the post-final-tool answer in isolation. A guard
      # replacement is a synthesized string with no narration to strip, so it
      # passes through unchanged.
      return guard.is_a?(String) ? guard : response.final_text_block
    end

    if response.has_tool_calls?
      persist_assistant_message(response)
      close_intermediate_stream(response)

      # Bedrock (and other providers) require the assistant turn with the
      # toolUse block to appear in the conversation history before the
      # toolResult turn. Append it now so the next LLM call sees the
      # correct sequence: user → assistant(toolUse) → user(toolResult).
      messages << build_assistant_tool_use_message(response)

      # NOTE: counting and `tool` message persistence happen in the
      # ToolExecutor's on_result sink (#handle_tool_result), which fires
      # for BOTH this non-streaming path and the streaming path (where
      # ruby_llm runs tools mid-stream and never returns here). We only
      # build the conversation-history messages for the next iteration.
      execute_tool_calls(response.tool_calls).each { |result| messages << result }
    else
      # Unreachable in practice: the ModelCallRunner either returns a
      # response with text or tool calls, or raises EmptyModelResponseError.
      # Kept as a defensive backstop so a future response shape can never
      # silently complete an empty turn.
      emit_turn_summary(turn_started_at, token_total)
      raise Rubino::EmptyModelResponseError
    end
  end
end