Class: Rubino::Context::TokenBudget

Inherits:

Object

Object
Rubino::Context::TokenBudget

show all

Defined in:: lib/rubino/context/token_budget.rb

Overview

Manages token budget calculations and determines when compaction is needed.

Constant Summary collapse

CHARS_PER_TOKEN = Rough approximation

DEFAULT_CONTEXT_WINDOW = Fallback when the user hasn’t pinned ‘model.context_length` in config. Generous-but-safe; truncation kicks in via `needs_compaction?` long before the real provider limit would be hit.

128_000

MINIMUM_CONTEXT_LENGTH = Floor for the auto-compaction trigger (#410). Ported from Hermes ‘context_compressor.py` (MINIMUM_CONTEXT_LENGTH, model_metadata.py): never auto-compact below this many estimated tokens even when the percentage threshold would suggest a lower value. Without it a 32K model auto-compacts at 16K — half the window spent on a summary —while a large-window model still compacts at the configured ratio.

64_000

MAX_WINDOW_FRACTION = Fraction of the window the auto-compaction threshold may never exceed (#410 follow-up). The bare MINIMUM_CONTEXT_LENGTH floor made compaction UNREACHABLE on sub-128k windows: a 64k window floored the threshold to 64k — the WHOLE window — so auto-compact only fired at ~100% (too late), and a window below the floor NEVER compacted at all. Capping the threshold at this fraction of the ACTUAL window guarantees it always fires BEFORE the window fills, at any size.

0.85

Instance Attribute Summary collapse

#context_window ⇒ Object readonly

Returns the value of attribute context_window.

Instance Method Summary collapse

#available_tokens ⇒ Object

Returns the max tokens available for conversation.
#compaction_target ⇒ Object

Returns the target token count after compaction.
#compaction_threshold ⇒ Object

The token count above which auto-compaction fires: the configured ratio of the window, floored at MINIMUM_CONTEXT_LENGTH (#410) to stay anti-over-eager on LARGE windows, but CLAMPED so it never exceeds MAX_WINDOW_FRACTION of the actual window — otherwise the floor pushes the trigger past the end of a small window and auto-compaction never fires.
#estimate_tokens(messages) ⇒ Object

Estimates token count for a set of messages.
#initialize(model_id:, config:) ⇒ TokenBudget constructor

A new instance of TokenBudget.
#needs_compaction?(messages) ⇒ Boolean

Returns true if the messages exceed the compaction threshold.

Constructor Details

#initialize(model_id:, config:) ⇒ `TokenBudget`

Returns a new instance of TokenBudget.

# File 'lib/rubino/context/token_budget.rb', line 21

def initialize(model_id:, config:)
  @model_id = model_id
  @config = config
  @context_window = determine_context_window
end

Instance Attribute Details

#context_window ⇒ `Object` (readonly)

Returns the value of attribute context_window.



27
28
29

# File 'lib/rubino/context/token_budget.rb', line 27

def context_window
  @context_window
end

Instance Method Details

#available_tokens ⇒ `Object`

Returns the max tokens available for conversation

# File 'lib/rubino/context/token_budget.rb', line 30

def available_tokens
  override = @config.dig("context", "max_tokens")
  override || @context_window
end

#compaction_target ⇒ `Object`

Returns the target token count after compaction



77
78
79

# File 'lib/rubino/context/token_budget.rb', line 77

def compaction_target
  (available_tokens * @config.compression_target_ratio).to_i
end

#compaction_threshold ⇒ `Object`

The token count above which auto-compaction fires: the configured ratio of the window, floored at MINIMUM_CONTEXT_LENGTH (#410) to stay anti-over-eager on LARGE windows, but CLAMPED so it never exceeds MAX_WINDOW_FRACTION of the actual window — otherwise the floor pushes the trigger past the end of a small window and auto-compaction never fires.

threshold = min( max(window·ratio, FLOOR), window·0.85 )

On a 128k+ window the floor/ratio still governs (0.85·window is larger); on an 8k window the 0.85 cap governs (~6.8k) so compaction stays reachable.

# File 'lib/rubino/context/token_budget.rb', line 70

def compaction_threshold
  floored = [(available_tokens * @config.compression_threshold).to_i, MINIMUM_CONTEXT_LENGTH].max
  ceiling = (available_tokens * MAX_WINDOW_FRACTION).to_i
  [floored, ceiling].min
end

#estimate_tokens(messages) ⇒ `Object`

Estimates token count for a set of messages. Routes through TokenEstimate so a Content::Raw system block (#311) is sized correctly instead of crashing on a missing #length.

# File 'lib/rubino/context/token_budget.rb', line 38

def estimate_tokens(messages)
  total_chars = messages.sum { |m| TokenEstimate.content_char_length(m[:content]) }
  (total_chars.to_f / CHARS_PER_TOKEN).ceil
end

#needs_compaction?(messages) ⇒ `Boolean`

Returns true if the messages exceed the compaction threshold. The threshold is floored at MINIMUM_CONTEXT_LENGTH (#410) so the percentage never drives premature compaction on small/mid windows.

Returns:

(Boolean)

# File 'lib/rubino/context/token_budget.rb', line 46

def needs_compaction?(messages)
  return false unless @config.compression_enabled?

  estimated = estimate_tokens(messages)
  estimated > compaction_threshold
end

Class: Rubino::Context::TokenBudget

Overview

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model_id:, config:) ⇒ TokenBudget

Instance Attribute Details

#context_window ⇒ Object (readonly)

Instance Method Details

#available_tokens ⇒ Object

#compaction_target ⇒ Object

#compaction_threshold ⇒ Object

#estimate_tokens(messages) ⇒ Object