Class: RubyPi::Context::Compaction
- Inherits:
-
Object
- Object
- RubyPi::Context::Compaction
- Defined in:
- lib/ruby_pi/context/compaction.rb
Overview
Manages context window size by summarizing older messages when the estimated token count exceeds a configurable threshold. The most recent N messages are always preserved to maintain conversational coherence.
Constant Summary collapse
- CHARS_PER_TOKEN =
Average characters per token — a rough heuristic that avoids the need for provider-specific tokenizers. Errs on the conservative side.
4
Instance Attribute Summary collapse
-
#emitter ⇒ #emit?
Optional event emitter for :compaction events.
-
#max_tokens ⇒ Integer
readonly
The token threshold above which compaction triggers.
-
#preserve_last_n ⇒ Integer
readonly
Number of recent messages always preserved.
-
#summary_model ⇒ RubyPi::LLM::BaseProvider
readonly
The model used to generate summaries.
Instance Method Summary collapse
-
#build_compacted_history(summary, preserved) ⇒ Array<Hash>
Builds the compacted history: a summary message followed by the preserved tail.
-
#compact(messages, system_prompt) ⇒ Array<Hash>?
Compacts the message history if the estimated token count exceeds the threshold.
-
#estimate_tokens(system_prompt, messages) ⇒ Integer
Estimates the total token count for a system prompt and message array using the character-based heuristic.
-
#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction
constructor
Creates a new Compaction instance.
Constructor Details
#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction
Creates a new Compaction instance.
51 52 53 54 55 56 |
# File 'lib/ruby_pi/context/compaction.rb', line 51 def initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) @max_tokens = max_tokens @summary_model = summary_model @preserve_last_n = preserve_last_n @emitter = nil end |
Instance Attribute Details
#emitter ⇒ #emit?
Returns optional event emitter for :compaction events.
42 43 44 |
# File 'lib/ruby_pi/context/compaction.rb', line 42 def emitter @emitter end |
#max_tokens ⇒ Integer (readonly)
Returns the token threshold above which compaction triggers.
33 34 35 |
# File 'lib/ruby_pi/context/compaction.rb', line 33 def max_tokens @max_tokens end |
#preserve_last_n ⇒ Integer (readonly)
Returns number of recent messages always preserved.
39 40 41 |
# File 'lib/ruby_pi/context/compaction.rb', line 39 def preserve_last_n @preserve_last_n end |
#summary_model ⇒ RubyPi::LLM::BaseProvider (readonly)
Returns the model used to generate summaries.
36 37 38 |
# File 'lib/ruby_pi/context/compaction.rb', line 36 def summary_model @summary_model end |
Instance Method Details
#build_compacted_history(summary, preserved) ⇒ Array<Hash>
Builds the compacted history: a summary message followed by the preserved tail.
The summary becomes the FIRST message of the compacted history, so it must satisfy the strictest provider constraints (Anthropic):
1. The summary role MUST NOT be :system — that would overwrite the
real system prompt on Anthropic, which promotes the last :system
message to the top-level `system:` parameter.
2. The first message MUST use role :user.
3. Consecutive same-role messages are rejected.
A :user summary satisfies (1) and (2). For (3): the orphan-strip above guarantees the first preserved message is :assistant, :user, or absent (never :tool). When it is :assistant or absent, a standalone :user summary alternates correctly. When it is :user, a separate :user summary would create two consecutive user messages, so we instead merge the summary text into that existing user message — keeping the first message a single :user message with no role collision.
138 139 140 141 142 143 144 145 146 147 148 149 |
# File 'lib/ruby_pi/context/compaction.rb', line 138 def build_compacted_history(summary, preserved) summary_text = "[Conversation Summary]\n#{summary}" first_preserved = preserved.first if first_preserved && first_preserved[:role] == :user merged = first_preserved.dup merged[:content] = "#{summary_text}\n\n#{first_preserved[:content]}" [merged] + preserved.drop(1) else [{ role: :user, content: summary_text }] + preserved end end |
#compact(messages, system_prompt) ⇒ Array<Hash>?
Compacts the message history if the estimated token count exceeds the threshold. Returns the compacted messages array, or nil if no compaction was needed.
The compaction process:
-
Estimate total tokens for system_prompt + all messages.
-
If under threshold, return nil (no compaction needed).
-
Split messages into “droppable” (older) and “preserved” (recent).
-
Summarize the droppable messages via the summary model.
-
Return a new array: [summary_message] + preserved_messages.
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/ruby_pi/context/compaction.rb', line 72 def compact(, system_prompt) total_tokens = estimate_tokens(system_prompt, ) return nil if total_tokens <= @max_tokens # Split into messages to summarize and messages to keep preserved_count = [@preserve_last_n, .size].min droppable = [0...(.size - preserved_count)].dup preserved = [(.size - preserved_count)..].dup # If there's nothing to drop, we can't compact further return nil if droppable.empty? # Anthropic and OpenAI both require every tool_result / tool message # to reference a tool_use / tool_call from a preceding assistant # message. If we summarize the assistant turn that originated a tool # call but keep the matching tool_result, the API rejects the # request with "tool_result without preceding tool_use". # # When the boundary between droppable and preserved cuts mid-exchange, # preserved can start with one or more orphan :tool messages whose # matching assistant turn is in droppable. Strip those off the head of # preserved and move them into droppable so they are summarized away # rather than sent. Because the originating assistant message is older, # it is already in droppable, so the pair stays together there — there # is no mirror case to handle (once a tool result is moved across, its # assistant is never left stranded on the preserved side). while preserved.first && preserved.first[:role] == :tool droppable << preserved.shift end # The orphan-strip only moves messages INTO droppable, so droppable # cannot have shrunk; it is still non-empty here. preserved, however, # may now be empty (the whole window was tool results) — the summary # construction below handles that case. # Generate a summary of the dropped messages summary = summarize(droppable) # Emit compaction event if an emitter is available @emitter&.emit(:compaction, dropped_count: droppable.size, summary: summary) build_compacted_history(summary, preserved) end |
#estimate_tokens(system_prompt, messages) ⇒ Integer
Estimates the total token count for a system prompt and message array using the character-based heuristic.
157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/ruby_pi/context/compaction.rb', line 157 def estimate_tokens(system_prompt, ) total_chars = system_prompt.to_s.length .each do |msg| total_chars += msg[:content].to_s.length # Account for role and structural overhead (~10 tokens per message) total_chars += 40 end (total_chars.to_f / CHARS_PER_TOKEN).ceil end |