Module: Rubino::LLM::ReasoningManager

Defined in:: lib/rubino/llm/reasoning_manager.rb

Overview

Renders the reasoning/thinking configuration to the Anthropic-compat wire params (manual mode). This is the Ruby port of the reference reasoning_config →‘thinking` mapping: on the manual path (MiniMax /anthropic, older Anthropic, bedrock) thinking is enabled with a token budget, which FORCES temperature=1 and bumps max_tokens so the budget fits under it with text headroom to still answer.

The numbers (budget 8000 “medium”, text headroom 4096, 16384 ceiling) are sourced from config (model.thinking_budget / model.max_tokens_text_headroom / model.max_tokens) by the adapter and passed in — this object holds no magic numbers of its own; it only mirrors the reference combination rules.

One source of truth: the adapter calls #render exactly once per chat build to derive the params, and applies them; the inline Slice 0© logic that used to live in RubyLLMAdapter#apply_generation_params now lives here.

Defined Under Namespace

Classes: Rendered

Class Method Summary collapse

.render(budget:, temperature: nil, max_tokens: nil, text_headroom: 4096, apply_max_tokens: true) ⇒ Object

Render the reasoning config to wire params.
.render_max_tokens(enabled, budget, max_tokens, text_headroom) ⇒ Object
.render_temperature(enabled, temperature) ⇒ Object

Class Method Details

.render(budget:, temperature: nil, max_tokens: nil, text_headroom: 4096, apply_max_tokens: true) ⇒ `Object`

Render the reasoning config to wire params.

budget : Integer — thinking token budget; 0/nil disables thinking temperature : Float|nil — configured sampling temperature (ignored when

thinking is enabled — Anthropic requires 1 then)

max_tokens : Integer|nil — configured output ceiling; nil ⇒ leave the

provider default UNLESS thinking forces a floor

text_headroom : Integer — visible-output tokens reserved on top of budget apply_max_tokens: Bool — only the anthropic-family path raises the ceiling;

openai/ollama/etc. leave token limits to the provider

Mirrors anthropic_adapter.py:2238–2241:

kwargs["thinking"]    = {type: enabled, budget_tokens: budget}
kwargs["temperature"] = 1
kwargs["max_tokens"]  = max(effective_max_tokens, budget + headroom)

# File 'lib/rubino/llm/reasoning_manager.rb', line 48

def render(budget:, temperature: nil, max_tokens: nil,
           text_headroom: 4096, apply_max_tokens: true)
  budget = budget.to_i
  enabled = budget.positive?

  Rendered.new(
    thinking: enabled ? { type: :enabled, budget_tokens: budget } : nil,
    temperature: render_temperature(enabled, temperature),
    max_tokens: apply_max_tokens ? render_max_tokens(enabled, budget, max_tokens, text_headroom) : nil
  )
end

.render_max_tokens(enabled, budget, max_tokens, text_headroom) ⇒ `Object`

# File 'lib/rubino/llm/reasoning_manager.rb', line 66

def render_max_tokens(enabled, budget, max_tokens, text_headroom)
  ceiling = max_tokens
  floor   = budget + text_headroom.to_i
  ceiling = [ceiling.to_i, floor].max if enabled && ceiling
  ceiling = floor if enabled && ceiling.nil?
  return nil unless ceiling&.positive?

  ceiling
end

.render_temperature(enabled, temperature) ⇒ `Object`

# File 'lib/rubino/llm/reasoning_manager.rb', line 60

def render_temperature(enabled, temperature)
  return 1 if enabled

  temperature
end