Class: Rubino::Context::TokenBudget

Inherits:
Object
  • Object
show all
Defined in:
lib/rubino/context/token_budget.rb

Overview

Manages token budget calculations and determines when compaction is needed.

Constant Summary collapse

CHARS_PER_TOKEN =

Rough approximation

4
DEFAULT_CONTEXT_WINDOW =

Fallback when the user hasn’t pinned ‘model.context_length` in config. Generous-but-safe; truncation kicks in via `needs_compaction?` long before the real provider limit would be hit.

128_000
MINIMUM_CONTEXT_LENGTH =

Floor for the auto-compaction trigger (#410). Ported from Hermes ‘context_compressor.py` (MINIMUM_CONTEXT_LENGTH, model_metadata.py): never auto-compact below this many estimated tokens even when the percentage threshold would suggest a lower value. Without it a 32K model auto-compacts at 16K — half the window spent on a summary —while a large-window model still compacts at the configured ratio.

64_000
MAX_WINDOW_FRACTION =

Fraction of the window the auto-compaction threshold may never exceed (#410 follow-up). The bare MINIMUM_CONTEXT_LENGTH floor made compaction UNREACHABLE on sub-128k windows: a 64k window floored the threshold to 64k — the WHOLE window — so auto-compact only fired at ~100% (too late), and a window below the floor NEVER compacted at all. Capping the threshold at this fraction of the ACTUAL window guarantees it always fires BEFORE the window fills, at any size.

0.85

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model_id:, config:) ⇒ TokenBudget

Returns a new instance of TokenBudget.



21
22
23
24
25
# File 'lib/rubino/context/token_budget.rb', line 21

def initialize(model_id:, config:)
  @model_id = model_id
  @config = config
  @context_window = determine_context_window
end

Instance Attribute Details

#context_windowObject (readonly)

Returns the value of attribute context_window.



27
28
29
# File 'lib/rubino/context/token_budget.rb', line 27

def context_window
  @context_window
end

Instance Method Details

#available_tokensObject

Returns the max tokens available for conversation



30
31
32
33
# File 'lib/rubino/context/token_budget.rb', line 30

def available_tokens
  override = @config.dig("context", "max_tokens")
  override || @context_window
end

#compaction_targetObject

Returns the target token count after compaction



77
78
79
# File 'lib/rubino/context/token_budget.rb', line 77

def compaction_target
  (available_tokens * @config.compression_target_ratio).to_i
end

#compaction_thresholdObject

The token count above which auto-compaction fires: the configured ratio of the window, floored at MINIMUM_CONTEXT_LENGTH (#410) to stay anti-over-eager on LARGE windows, but CLAMPED so it never exceeds MAX_WINDOW_FRACTION of the actual window — otherwise the floor pushes the trigger past the end of a small window and auto-compaction never fires.

threshold = min( max(window·ratio, FLOOR), window·0.85 )

On a 128k+ window the floor/ratio still governs (0.85·window is larger); on an 8k window the 0.85 cap governs (~6.8k) so compaction stays reachable.



70
71
72
73
74
# File 'lib/rubino/context/token_budget.rb', line 70

def compaction_threshold
  floored = [(available_tokens * @config.compression_threshold).to_i, MINIMUM_CONTEXT_LENGTH].max
  ceiling = (available_tokens * MAX_WINDOW_FRACTION).to_i
  [floored, ceiling].min
end

#estimate_tokens(messages) ⇒ Object

Estimates token count for a set of messages. Routes through TokenEstimate so a Content::Raw system block (#311) is sized correctly instead of crashing on a missing #length.



38
39
40
41
# File 'lib/rubino/context/token_budget.rb', line 38

def estimate_tokens(messages)
  total_chars = messages.sum { |m| TokenEstimate.content_char_length(m[:content]) }
  (total_chars.to_f / CHARS_PER_TOKEN).ceil
end

#needs_compaction?(messages) ⇒ Boolean

Returns true if the messages exceed the compaction threshold. The threshold is floored at MINIMUM_CONTEXT_LENGTH (#410) so the percentage never drives premature compaction on small/mid windows.

Returns:

  • (Boolean)


46
47
48
49
50
51
# File 'lib/rubino/context/token_budget.rb', line 46

def needs_compaction?(messages)
  return false unless @config.compression_enabled?

  estimated = estimate_tokens(messages)
  estimated > compaction_threshold
end