Class: Rubino::Context::TokenBudget
- Inherits:
-
Object
- Object
- Rubino::Context::TokenBudget
- Defined in:
- lib/rubino/context/token_budget.rb
Overview
Manages token budget calculations and determines when compaction is needed.
Constant Summary collapse
- CHARS_PER_TOKEN =
Rough approximation
4- DEFAULT_CONTEXT_WINDOW =
Fallback when the user hasn’t pinned ‘model.context_length` in config. Generous-but-safe; truncation kicks in via `needs_compaction?` long before the real provider limit would be hit.
128_000- MINIMUM_CONTEXT_LENGTH =
Floor for the auto-compaction trigger (#410). Ported from Hermes ‘context_compressor.py` (MINIMUM_CONTEXT_LENGTH, model_metadata.py): never auto-compact below this many estimated tokens even when the percentage threshold would suggest a lower value. Without it a 32K model auto-compacts at 16K — half the window spent on a summary —while a large-window model still compacts at the configured ratio.
64_000- MAX_WINDOW_FRACTION =
Fraction of the window the auto-compaction threshold may never exceed (#410 follow-up). The bare MINIMUM_CONTEXT_LENGTH floor made compaction UNREACHABLE on sub-128k windows: a 64k window floored the threshold to 64k — the WHOLE window — so auto-compact only fired at ~100% (too late), and a window below the floor NEVER compacted at all. Capping the threshold at this fraction of the ACTUAL window guarantees it always fires BEFORE the window fills, at any size.
0.85
Instance Attribute Summary collapse
-
#context_window ⇒ Object
readonly
Returns the value of attribute context_window.
Instance Method Summary collapse
-
#available_tokens ⇒ Object
Returns the max tokens available for conversation.
-
#compaction_target ⇒ Object
Returns the target token count after compaction.
-
#compaction_threshold ⇒ Object
The token count above which auto-compaction fires: the configured ratio of the window, floored at MINIMUM_CONTEXT_LENGTH (#410) to stay anti-over-eager on LARGE windows, but CLAMPED so it never exceeds MAX_WINDOW_FRACTION of the actual window — otherwise the floor pushes the trigger past the end of a small window and auto-compaction never fires.
-
#estimate_tokens(messages) ⇒ Object
Estimates token count for a set of messages.
-
#initialize(model_id:, config:) ⇒ TokenBudget
constructor
A new instance of TokenBudget.
-
#needs_compaction?(messages) ⇒ Boolean
Returns true if the messages exceed the compaction threshold.
Constructor Details
#initialize(model_id:, config:) ⇒ TokenBudget
Returns a new instance of TokenBudget.
21 22 23 24 25 |
# File 'lib/rubino/context/token_budget.rb', line 21 def initialize(model_id:, config:) @model_id = model_id @config = config @context_window = determine_context_window end |
Instance Attribute Details
#context_window ⇒ Object (readonly)
Returns the value of attribute context_window.
27 28 29 |
# File 'lib/rubino/context/token_budget.rb', line 27 def context_window @context_window end |
Instance Method Details
#available_tokens ⇒ Object
Returns the max tokens available for conversation
30 31 32 33 |
# File 'lib/rubino/context/token_budget.rb', line 30 def available_tokens override = @config.dig("context", "max_tokens") override || @context_window end |
#compaction_target ⇒ Object
Returns the target token count after compaction
77 78 79 |
# File 'lib/rubino/context/token_budget.rb', line 77 def compaction_target (available_tokens * @config.compression_target_ratio).to_i end |
#compaction_threshold ⇒ Object
The token count above which auto-compaction fires: the configured ratio of the window, floored at MINIMUM_CONTEXT_LENGTH (#410) to stay anti-over-eager on LARGE windows, but CLAMPED so it never exceeds MAX_WINDOW_FRACTION of the actual window — otherwise the floor pushes the trigger past the end of a small window and auto-compaction never fires.
threshold = min( max(window·ratio, FLOOR), window·0.85 )
On a 128k+ window the floor/ratio still governs (0.85·window is larger); on an 8k window the 0.85 cap governs (~6.8k) so compaction stays reachable.
70 71 72 73 74 |
# File 'lib/rubino/context/token_budget.rb', line 70 def compaction_threshold floored = [(available_tokens * @config.compression_threshold).to_i, MINIMUM_CONTEXT_LENGTH].max ceiling = (available_tokens * MAX_WINDOW_FRACTION).to_i [floored, ceiling].min end |
#estimate_tokens(messages) ⇒ Object
Estimates token count for a set of messages. Routes through TokenEstimate so a Content::Raw system block (#311) is sized correctly instead of crashing on a missing #length.
38 39 40 41 |
# File 'lib/rubino/context/token_budget.rb', line 38 def estimate_tokens() total_chars = .sum { |m| TokenEstimate.content_char_length(m[:content]) } (total_chars.to_f / CHARS_PER_TOKEN).ceil end |
#needs_compaction?(messages) ⇒ Boolean
Returns true if the messages exceed the compaction threshold. The threshold is floored at MINIMUM_CONTEXT_LENGTH (#410) so the percentage never drives premature compaction on small/mid windows.
46 47 48 49 50 51 |
# File 'lib/rubino/context/token_budget.rb', line 46 def needs_compaction?() return false unless @config.compression_enabled? estimated = estimate_tokens() estimated > compaction_threshold end |