Class: RubynCode::Context::Manager

Inherits:
Object
  • Object
show all
Defined in:
lib/rubyn_code/context/manager.rb

Overview

Orchestrates context management for a session. Tracks cumulative token usage from LLM responses and triggers compaction strategies when the estimated context size exceeds the configured threshold.

Constant Summary collapse

CHARS_PER_TOKEN =
4
MICRO_COMPACT_RATIO_CACHED =

Runs micro-compaction every turn and auto-compaction when the context exceeds the threshold. Expects a conversation object that responds to #messages and #messages= (or #replace_messages).

Fraction of the compaction threshold at which micro-compact kicks in. Running it too early busts the prompt cache prefix (mutated messages change the hash, invalidating server-side cached tokens). Anthropic has prompt caching so we delay compaction (0.7). OpenAI has no cache prefix to protect so we compact earlier (0.5).

Returns:

  • (void)
0.7
MICRO_COMPACT_RATIO_UNCACHED =
0.5

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(threshold: Config::Defaults::CONTEXT_THRESHOLD_TOKENS, llm_client: nil) ⇒ Manager

Returns a new instance of Manager.

Parameters:

  • threshold (Integer) (defaults to: Config::Defaults::CONTEXT_THRESHOLD_TOKENS)

    estimated token count that triggers auto-compaction

  • llm_client (LLM::Client, nil) (defaults to: nil)

    needed for LLM-driven compaction



17
18
19
20
21
22
23
24
# File 'lib/rubyn_code/context/manager.rb', line 17

def initialize(threshold: Config::Defaults::CONTEXT_THRESHOLD_TOKENS, llm_client: nil)
  @threshold = threshold
  @llm_client = llm_client
  @total_input_tokens = 0
  @total_output_tokens = 0
  @last_compaction_turn = -1
  @current_turn = 0
end

Instance Attribute Details

#current_turnObject (readonly)

Returns the value of attribute current_turn.



13
14
15
# File 'lib/rubyn_code/context/manager.rb', line 13

def current_turn
  @current_turn
end

#llm_client=(value) ⇒ Object (writeonly)

Sets the attribute llm_client

Parameters:

  • value

    the value to set the attribute llm_client to.



26
27
28
# File 'lib/rubyn_code/context/manager.rb', line 26

def llm_client=(value)
  @llm_client = value
end

#total_input_tokensObject (readonly)

Returns the value of attribute total_input_tokens.



13
14
15
# File 'lib/rubyn_code/context/manager.rb', line 13

def total_input_tokens
  @total_input_tokens
end

#total_output_tokensObject (readonly)

Returns the value of attribute total_output_tokens.



13
14
15
# File 'lib/rubyn_code/context/manager.rb', line 13

def total_output_tokens
  @total_output_tokens
end

Instance Method Details

#advance_turn!Object

Advances the turn counter. Call once per iteration so that duplicate compaction calls within the same turn are skipped.



30
31
32
# File 'lib/rubyn_code/context/manager.rb', line 30

def advance_turn!
  @current_turn += 1
end

#check_compaction!(conversation) ⇒ Object



76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# File 'lib/rubyn_code/context/manager.rb', line 76

def check_compaction!(conversation)
  # Guard: skip if compaction already ran this turn
  return if @last_compaction_turn == @current_turn

  @last_compaction_turn = @current_turn

  messages = conversation.messages

  # Step 1: Zero-cost micro-compact — but only when we're approaching
  # the compaction threshold. Running it every turn mutates old messages,
  # which invalidates the prompt cache prefix and wastes tokens.
  est = estimated_tokens(messages)
  MicroCompact.call(messages) if est > (@threshold * micro_compact_ratio)

  return unless needs_compaction?(messages)

  # Step 2: Try context collapse (snip old messages, no LLM call)
  collapsed = ContextCollapse.call(messages, threshold: @threshold)
  if collapsed
    apply_compacted_messages(conversation, collapsed)
    return
  end

  # Step 3: Full LLM-driven auto-compact (expensive, last resort)
  return unless @llm_client

  compactor = Compactor.new(llm_client: @llm_client, threshold: @threshold)
  new_messages = compactor.auto_compact!(messages)
  apply_compacted_messages(conversation, new_messages)
end

#estimated_tokens(messages) ⇒ Integer

Rough estimate of token count for a set of messages based on their JSON-serialized character length (~4 chars per token).

Parameters:

  • messages (Array<Hash>)

    conversation messages

Returns:

  • (Integer)

    estimated token count



47
48
49
50
51
52
# File 'lib/rubyn_code/context/manager.rb', line 47

def estimated_tokens(messages)
  json = JSON.generate(messages)
  (json.length.to_f / CHARS_PER_TOKEN).ceil
rescue JSON::GeneratorError
  0
end

#needs_compaction?(messages) ⇒ Boolean

Returns true if the estimated token count exceeds the threshold.

Parameters:

  • messages (Array<Hash>)

    conversation messages

Returns:

  • (Boolean)


58
59
60
# File 'lib/rubyn_code/context/manager.rb', line 58

def needs_compaction?(messages)
  estimated_tokens(messages) > @threshold
end

#reset!void

This method returns an undefined value.

Resets cumulative token counters to zero.



110
111
112
113
114
115
# File 'lib/rubyn_code/context/manager.rb', line 110

def reset!
  @total_input_tokens = 0
  @total_output_tokens = 0
  @last_compaction_turn = -1
  @current_turn = 0
end

#track_usage(usage) ⇒ Object

Accumulates token counts from an LLM response usage object.

Parameters:

  • usage (LLM::Usage, #input_tokens)

    usage data from an LLM response



37
38
39
40
# File 'lib/rubyn_code/context/manager.rb', line 37

def track_usage(usage)
  @total_input_tokens += usage.input_tokens.to_i
  @total_output_tokens += usage.output_tokens.to_i
end