Class: Rubino::Agent::TruncationContinuation
- Inherits:
-
Object
- Object
- Rubino::Agent::TruncationContinuation
- Defined in:
- lib/rubino/agent/truncation_continuation.rb
Overview
Stitches a response truncated by the output-token limit back together.
Faithful port of the reference finish_reason==“length” continuation (the per-turn loop plus the boosted output budget). When a model call comes back with stop_reason==:length and NO tool calls, the answer was cut mid-sentence by max_tokens. Rather than surface the fragment as the final turn, we:
1. keep the interim partial as an assistant message in the history,
2. append a "[System: …continue exactly where you left off…]" user nudge,
3. re-issue the SAME request with a progressively BOOSTED output budget
(base × (retry+1), capped at 32 768), and
4. concatenate the partial pieces into the final answer.
Up to MAX_RETRIES (3, matching the reference ‘length_continue_retries < 3`) continuations are attempted; if it is still truncated after that, the stitched-together partial is returned as-is (the reference returns it with partial=True / “remained truncated after 3 continuation attempts”).
The class is transport-agnostic: it issues each continuation through a boundary callable (‘boundary.call(request) -> AdapterResponse`) so it unit-tests against fixtures with no network. The caller (Loop) builds the first request and passes the first response in.
Constant Summary collapse
- MAX_RETRIES =
The ‘length_continue_retries < 3` ceiling.
3- DEFAULT_BASE =
Fallback base when agent.max_tokens is unset.
4096- BOOST_CAP =
Boost cap.
32_768- CONTINUATION_NUDGE =
The continuation nudge for an ordinary output-length truncation (the ‘else` branch of the continuation-prompt builder). The partial-stream-stub variants don’t apply here — a dropped stream surfaces as AdapterResponse#interrupted?, handled separately by the Loop.
"[System: Your previous response was truncated by the output " \ "length limit. Continue exactly where you left off. Do not " \ "restart or repeat prior text. Finish the answer directly.]"
Instance Method Summary collapse
-
#applicable?(response) ⇒ Boolean
True iff
responseis a length-truncated turn that warrants continuation: stopped on the output limit AND carries no tool calls (a truncated tool-call turn is a different repair path — out of scope here, as in the reference’s separate truncated_tool_call branch). -
#continue(request, first_response) ⇒ Object
Drive the continuation loop.
-
#initialize(boundary:, base_tokens: nil, ui: nil) ⇒ TruncationContinuation
constructor
boundary: responds to #call(request, &block) → AdapterResponse.
Constructor Details
#initialize(boundary:, base_tokens: nil, ui: nil) ⇒ TruncationContinuation
boundary : responds to #call(request, &block) → AdapterResponse. base_tokens : the configured agent.max_tokens (nil ⇒ DEFAULT_BASE). ui : optional, gets #note on each continuation attempt.
47 48 49 50 51 |
# File 'lib/rubino/agent/truncation_continuation.rb', line 47 def initialize(boundary:, base_tokens: nil, ui: nil) @boundary = boundary @base_tokens = base_tokens @ui = ui end |
Instance Method Details
#applicable?(response) ⇒ Boolean
True iff response is a length-truncated turn that warrants continuation: stopped on the output limit AND carries no tool calls (a truncated tool-call turn is a different repair path — out of scope here, as in the reference’s separate truncated_tool_call branch).
57 58 59 |
# File 'lib/rubino/agent/truncation_continuation.rb', line 57 def applicable?(response) response&.stop_reason == :length && !response.has_tool_calls? end |
#continue(request, first_response) ⇒ Object
Drive the continuation loop. request is the LLM::Request that produced first_response; first_response is the truncated AdapterResponse. Re-issues with a boosted budget until the model stops cleanly or MAX_RETRIES is hit, then returns ONE AdapterResponse whose content is the stitched-together answer. A passed block forwards stream chunks straight through to the boundary on each continuation call.
If first_response is not applicable? this returns it untouched, so the Loop can call #continue unconditionally.
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
# File 'lib/rubino/agent/truncation_continuation.rb', line 70 def continue(request, first_response, &) return first_response unless applicable?(first_response) parts = collect_part(first_response) response = first_response retries = 0 while applicable?(response) && retries < MAX_RETRIES retries += 1 @ui&.note("↻ Requesting continuation (#{retries}/#{MAX_RETRIES})…") # Keep the interim partial in history, then nudge the model to resume. = request..dup << { role: "assistant", content: response.content.to_s } << { role: "user", content: CONTINUATION_NUDGE } request = reissue(request, , retries) response = @boundary.call(request, &) parts.concat(collect_part(response)) end stitch(response, parts) end |