Class: Phronomy::Context::TokenBudget

Inherits:
Object
  • Object
show all
Defined in:
lib/phronomy/context/token_budget.rb

Overview

Calculates the effective token budget available for conversation history and injected knowledge within a single LLM request.

The window is divided as follows:

context_window (total) ├─ max_output_tokens (reserved for model output = max_output_tokens) ├─ overhead (reserved for system prompt + tool definitions) └─ effective_input_limit (available for memory + knowledge)

Examples:

Auto-derive from RubyLLM model registry

budget = Phronomy::Context::TokenBudget.new(model: "claude-3-5-sonnet-20241022")

Explicit values (useful for local / unknown models)

budget = Phronomy::Context::TokenBudget.new(
  context_window:    32_768,
  max_output_tokens: 4_096
)

With overhead for instructions + tool definitions

budget = Phronomy::Context::TokenBudget.new(
  model:    "gpt-4o",
  overhead: 800
)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model: nil, context_window: nil, max_output_tokens: nil, overhead: 0) ⇒ TokenBudget

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns a new instance of TokenBudget.

Parameters:

  • model (String, nil) (defaults to: nil)

    model identifier looked up in RubyLLM

  • context_window (Integer, nil) (defaults to: nil)

    explicit total token limit

  • max_output_tokens (Integer, nil) (defaults to: nil)

    explicit output reservation; when nil and model is given, uses max_output_tokens

  • overhead (Integer) (defaults to: 0)

    tokens reserved for instructions/tools



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/phronomy/context/token_budget.rb', line 49

def initialize(model: nil, context_window: nil, max_output_tokens: nil, overhead: 0)
  @overhead = overhead.to_i

  if context_window
    # Explicit values — no registry lookup needed.
    @context_window = context_window.to_i
    @max_output_tokens = (max_output_tokens || 0).to_i
  elsif model
    ruby_llm_model = lookup_model!(model)
    @context_window = ruby_llm_model.context_window.to_i
    @max_output_tokens = (max_output_tokens || ruby_llm_model.max_output_tokens).to_i
  else
    raise ArgumentError, "Provide either model: or context_window:"
  end
end

Instance Attribute Details

#context_windowInteger (readonly)

Returns total token limit of the model.

Returns:

  • (Integer)

    total token limit of the model



35
36
37
# File 'lib/phronomy/context/token_budget.rb', line 35

def context_window
  @context_window
end

#max_output_tokensInteger (readonly)

Returns tokens reserved for model output.

Returns:

  • (Integer)

    tokens reserved for model output



38
39
40
# File 'lib/phronomy/context/token_budget.rb', line 38

def max_output_tokens
  @max_output_tokens
end

#overheadInteger (readonly)

Returns tokens reserved for instructions and tool definitions.

Returns:

  • (Integer)

    tokens reserved for instructions and tool definitions



41
42
43
# File 'lib/phronomy/context/token_budget.rb', line 41

def overhead
  @overhead
end

Instance Method Details

#available(used: 0) ⇒ Integer

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Tokens still available after used tokens have been allocated.

Parameters:

  • used (Integer) (defaults to: 0)

    tokens already committed (e.g. from knowledge injection)

Returns:

  • (Integer)

    remaining tokens (always >= 0)



79
80
81
# File 'lib/phronomy/context/token_budget.rb', line 79

def available(used: 0)
  [effective_input_limit - used.to_i, 0].max
end

#effective_input_limitInteger

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Tokens available for conversation history and knowledge after reservations. Always >= 0.

Returns:

  • (Integer)


70
71
72
# File 'lib/phronomy/context/token_budget.rb', line 70

def effective_input_limit
  [@context_window - @max_output_tokens - @overhead, 0].max
end