Class: Rubino::LLM::ReasoningManager

Inherits:
Object
  • Object
show all
Defined in:
lib/rubino/llm/reasoning_manager.rb

Overview

Renders the reasoning/thinking configuration to the Anthropic-compat wire params (manual mode). This is the Ruby port of the reference reasoning_config →‘thinking` mapping: on the manual path (MiniMax /anthropic, older Anthropic, bedrock) thinking is enabled with a token budget, which FORCES temperature=1 and bumps max_tokens so the budget fits under it with text headroom to still answer.

The numbers (budget 8000 “medium”, text headroom 4096, 16384 ceiling) are sourced from config (model.thinking_budget / model.max_tokens_text_headroom / model.max_tokens) by the adapter and passed in — this object holds no magic numbers of its own; it only mirrors the reference combination rules.

One source of truth: the adapter calls #render exactly once per chat build to derive the params, and applies them; the inline Slice 0© logic that used to live in RubyLLMAdapter#apply_generation_params now lives here.

Defined Under Namespace

Classes: Rendered

Instance Method Summary collapse

Instance Method Details

#carry(history) ⇒ Object

Echo-back seam (reference reapply_reasoning_echo_for_provider): in the reference, prior-turn assistant reasoning_content is re-padded onto the api copy for providers that REQUIRE it back (DeepSeek/Kimi/MiMo thinking mode) or it 400s.

DOCUMENTED NO-OP SEAM (per the boundary spike): the installed ruby_llm 1.15 RubyLLM::Message exposes thinking as a read-only attr (no setter, only an initializer keyword) and our anthropic-compat target (MiniMax /anthropic) does NOT require reasoning echo-back — only the OpenAI-compat require-side providers do, which we do not yet target. So there is no transport to carry reasoning back through cleanly today; fabricating one would be wrong. This stays a no-op until a require-side provider lands (Slice 7 fallback may surface one) and ruby_llm offers a reachable reasoning field on replayed messages.

Returns history unchanged.



74
75
76
77
78
79
# File 'lib/rubino/llm/reasoning_manager.rb', line 74

def carry(history)
  # TODO(slice-7+): when a require-side provider (DeepSeek/Kimi/MiMo) is
  # supported and ruby_llm exposes a settable reasoning field on replayed
  # assistant messages, fold prior-turn reasoning back here.
  history
end

#render(budget:, temperature: nil, max_tokens: nil, text_headroom: 4096, apply_max_tokens: true) ⇒ Object

Render the reasoning config to wire params.

budget : Integer — thinking token budget; 0/nil disables thinking temperature : Float|nil — configured sampling temperature (ignored when

thinking is enabled — Anthropic requires 1 then)

max_tokens : Integer|nil — configured output ceiling; nil ⇒ leave the

provider default UNLESS thinking forces a floor

text_headroom : Integer — visible-output tokens reserved on top of budget apply_max_tokens: Bool — only the anthropic-family path raises the ceiling;

openai/ollama/etc. leave token limits to the provider

Mirrors anthropic_adapter.py:2238–2241:

kwargs["thinking"]    = {type: enabled, budget_tokens: budget}
kwargs["temperature"] = 1
kwargs["max_tokens"]  = max(effective_max_tokens, budget + headroom)


46
47
48
49
50
51
52
53
54
55
56
# File 'lib/rubino/llm/reasoning_manager.rb', line 46

def render(budget:, temperature: nil, max_tokens: nil,
           text_headroom: 4096, apply_max_tokens: true)
  budget = budget.to_i
  enabled = budget.positive?

  Rendered.new(
    thinking: enabled ? { type: :enabled, budget_tokens: budget } : nil,
    temperature: render_temperature(enabled, temperature),
    max_tokens: apply_max_tokens ? render_max_tokens(enabled, budget, max_tokens, text_headroom) : nil
  )
end