Class: RubyPi::Context::Compaction
- Inherits:
-
Object
- Object
- RubyPi::Context::Compaction
- Defined in:
- lib/ruby_pi/context/compaction.rb
Overview
Manages context window size by summarizing older messages when the estimated token count exceeds a configurable threshold. The most recent N messages are always preserved to maintain conversational coherence.
Constant Summary collapse
- CHARS_PER_TOKEN =
Average characters per token — a rough heuristic that avoids the need for provider-specific tokenizers. Errs on the conservative side.
4
Instance Attribute Summary collapse
-
#emitter ⇒ #emit?
Optional event emitter for :compaction events.
-
#max_tokens ⇒ Integer
readonly
The token threshold above which compaction triggers.
-
#preserve_last_n ⇒ Integer
readonly
Number of recent messages always preserved.
-
#summary_model ⇒ RubyPi::LLM::BaseProvider
readonly
The model used to generate summaries.
Instance Method Summary collapse
-
#compact(messages, system_prompt) ⇒ Array<Hash>?
Compacts the message history if the estimated token count exceeds the threshold.
-
#estimate_tokens(system_prompt, messages) ⇒ Integer
Estimates the total token count for a system prompt and message array using the character-based heuristic.
-
#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction
constructor
Creates a new Compaction instance.
Constructor Details
#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction
Creates a new Compaction instance.
51 52 53 54 55 56 |
# File 'lib/ruby_pi/context/compaction.rb', line 51 def initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) @max_tokens = max_tokens @summary_model = summary_model @preserve_last_n = preserve_last_n @emitter = nil end |
Instance Attribute Details
#emitter ⇒ #emit?
Returns optional event emitter for :compaction events.
42 43 44 |
# File 'lib/ruby_pi/context/compaction.rb', line 42 def emitter @emitter end |
#max_tokens ⇒ Integer (readonly)
Returns the token threshold above which compaction triggers.
33 34 35 |
# File 'lib/ruby_pi/context/compaction.rb', line 33 def max_tokens @max_tokens end |
#preserve_last_n ⇒ Integer (readonly)
Returns number of recent messages always preserved.
39 40 41 |
# File 'lib/ruby_pi/context/compaction.rb', line 39 def preserve_last_n @preserve_last_n end |
#summary_model ⇒ RubyPi::LLM::BaseProvider (readonly)
Returns the model used to generate summaries.
36 37 38 |
# File 'lib/ruby_pi/context/compaction.rb', line 36 def summary_model @summary_model end |
Instance Method Details
#compact(messages, system_prompt) ⇒ Array<Hash>?
Compacts the message history if the estimated token count exceeds the threshold. Returns the compacted messages array, or nil if no compaction was needed.
The compaction process:
-
Estimate total tokens for system_prompt + all messages.
-
If under threshold, return nil (no compaction needed).
-
Split messages into “droppable” (older) and “preserved” (recent).
-
Summarize the droppable messages via the summary model.
-
Return a new array: [summary_message] + preserved_messages.
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/ruby_pi/context/compaction.rb', line 72 def compact(, system_prompt) total_tokens = estimate_tokens(system_prompt, ) return nil if total_tokens <= @max_tokens # Split into messages to summarize and messages to keep preserved_count = [@preserve_last_n, .size].min droppable = [0...(.size - preserved_count)].dup preserved = [(.size - preserved_count)..].dup # If there's nothing to drop, we can't compact further return nil if droppable.empty? # Anthropic and OpenAI both require every tool_result / tool message # to reference a tool_use / tool_call from a preceding assistant # message. If we summarize the assistant turn that originated a tool # call but keep the matching tool_result, the API rejects the # request with "tool_result without preceding tool_use". # # The boundary between droppable and preserved can split a tool # exchange in two ways: # (a) preserved starts with one or more :tool messages whose # matching assistant turn is in droppable. Strip those # orphan tool messages from the head of preserved (move # them into droppable so they are summarized, not sent). # (b) the last droppable message is an :assistant with tool_calls, # but its matching :tool result(s) are in preserved. Pull # that assistant message back into preserved so the pair # stays intact. # # We apply (a) first: it's the common case (preserve_last_n=4 cuts # mid-pair, leaving a stranded tool message). Then (b) catches the # mirror case. while preserved.first && preserved.first[:role] == :tool droppable << preserved.shift end if droppable.last && droppable.last[:role] == :assistant && droppable.last[:tool_calls].is_a?(Array) && !droppable.last[:tool_calls].empty? && preserved.first && preserved.first[:role] == :tool preserved.unshift(droppable.pop) end # After the boundary fix-ups, droppable may have become empty. return nil if droppable.empty? # Generate a summary of the dropped messages summary = summarize(droppable) # Emit compaction event if an emitter is available @emitter&.emit(:compaction, dropped_count: droppable.size, summary: summary) # Build the compacted history: summary message + preserved. # # The summary role MUST NOT be :system (that would overwrite the real # system prompt on Anthropic, which extracts the last :system message # as the top-level `system:` parameter). # # The summary role must also NOT match the role of the first preserved # message — consecutive same-role messages are rejected by Anthropic. # We pick :user when the next preserved message is :assistant, and # :assistant otherwise (covers :user, :tool, and an empty preserved). # On Anthropic, :tool messages become role :user with tool_result # blocks, so :assistant is the safe choice when the next message is # :tool too. first_preserved_role = preserved.first&.dig(:role) summary_role = first_preserved_role == :assistant ? :user : :assistant = { role: summary_role, content: "[Conversation Summary]\n#{summary}" } [] + preserved end |
#estimate_tokens(system_prompt, messages) ⇒ Integer
Estimates the total token count for a system prompt and message array using the character-based heuristic.
155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/ruby_pi/context/compaction.rb', line 155 def estimate_tokens(system_prompt, ) total_chars = system_prompt.to_s.length .each do |msg| total_chars += msg[:content].to_s.length # Account for role and structural overhead (~10 tokens per message) total_chars += 40 end (total_chars.to_f / CHARS_PER_TOKEN).ceil end |