Module: Rubino::LLM::ReasoningManager
- Defined in:
- lib/rubino/llm/reasoning_manager.rb
Overview
Renders the reasoning/thinking configuration to the Anthropic-compat wire params (manual mode). This is the Ruby port of the reference reasoning_config →‘thinking` mapping: on the manual path (MiniMax /anthropic, older Anthropic, bedrock) thinking is enabled with a token budget, which FORCES temperature=1 and bumps max_tokens so the budget fits under it with text headroom to still answer.
The numbers (budget 8000 “medium”, text headroom 4096, 16384 ceiling) are sourced from config (model.thinking_budget / model.max_tokens_text_headroom / model.max_tokens) by the adapter and passed in — this object holds no magic numbers of its own; it only mirrors the reference combination rules.
One source of truth: the adapter calls #render exactly once per chat build to derive the params, and applies them; the inline Slice 0© logic that used to live in RubyLLMAdapter#apply_generation_params now lives here.
Defined Under Namespace
Classes: Rendered
Class Method Summary collapse
-
.render(budget:, temperature: nil, max_tokens: nil, text_headroom: 4096, apply_max_tokens: true) ⇒ Object
Render the reasoning config to wire params.
- .render_max_tokens(enabled, budget, max_tokens, text_headroom) ⇒ Object
- .render_temperature(enabled, temperature) ⇒ Object
Class Method Details
.render(budget:, temperature: nil, max_tokens: nil, text_headroom: 4096, apply_max_tokens: true) ⇒ Object
Render the reasoning config to wire params.
budget : Integer — thinking token budget; 0/nil disables thinking temperature : Float|nil — configured sampling temperature (ignored when
thinking is enabled — Anthropic requires 1 then)
max_tokens : Integer|nil — configured output ceiling; nil ⇒ leave the
provider default UNLESS thinking forces a floor
text_headroom : Integer — visible-output tokens reserved on top of budget apply_max_tokens: Bool — only the anthropic-family path raises the ceiling;
openai/ollama/etc. leave token limits to the provider
Mirrors anthropic_adapter.py:2238–2241:
kwargs["thinking"] = {type: enabled, budget_tokens: budget}
kwargs["temperature"] = 1
kwargs["max_tokens"] = max(effective_max_tokens, budget + headroom)
48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/rubino/llm/reasoning_manager.rb', line 48 def render(budget:, temperature: nil, max_tokens: nil, text_headroom: 4096, apply_max_tokens: true) budget = budget.to_i enabled = budget.positive? Rendered.new( thinking: enabled ? { type: :enabled, budget_tokens: budget } : nil, temperature: render_temperature(enabled, temperature), max_tokens: apply_max_tokens ? render_max_tokens(enabled, budget, max_tokens, text_headroom) : nil ) end |
.render_max_tokens(enabled, budget, max_tokens, text_headroom) ⇒ Object
66 67 68 69 70 71 72 73 74 |
# File 'lib/rubino/llm/reasoning_manager.rb', line 66 def render_max_tokens(enabled, budget, max_tokens, text_headroom) ceiling = max_tokens floor = budget + text_headroom.to_i ceiling = [ceiling.to_i, floor].max if enabled && ceiling ceiling = floor if enabled && ceiling.nil? return nil unless ceiling&.positive? ceiling end |
.render_temperature(enabled, temperature) ⇒ Object
60 61 62 63 64 |
# File 'lib/rubino/llm/reasoning_manager.rb', line 60 def render_temperature(enabled, temperature) return 1 if enabled temperature end |