Class: RubyPi::Context::Compaction

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby_pi/context/compaction.rb

Overview

Manages context window size by summarizing older messages when the estimated token count exceeds a configurable threshold. The most recent N messages are always preserved to maintain conversational coherence.

Examples:

Configuring compaction

compaction = RubyPi::Context::Compaction.new(
  max_tokens: 8000,
  summary_model: model,
  preserve_last_n: 4
)
compacted = compaction.compact(messages, system_prompt)

Constant Summary collapse

CHARS_PER_TOKEN =

Average characters per token — a rough heuristic that avoids the need for provider-specific tokenizers. Errs on the conservative side.

4

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction

Creates a new Compaction instance.

Parameters:

  • max_tokens (Integer) (defaults to: 8000)

    trigger compaction above this token estimate (default: 8000)

  • summary_model (RubyPi::LLM::BaseProvider)

    model for summarization

  • preserve_last_n (Integer) (defaults to: 4)

    always keep the last N messages (default: 4)



51
52
53
54
55
56
# File 'lib/ruby_pi/context/compaction.rb', line 51

def initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4)
  @max_tokens = max_tokens
  @summary_model = summary_model
  @preserve_last_n = preserve_last_n
  @emitter = nil
end

Instance Attribute Details

#emitter#emit?

Returns optional event emitter for :compaction events.

Returns:

  • (#emit, nil)

    optional event emitter for :compaction events



42
43
44
# File 'lib/ruby_pi/context/compaction.rb', line 42

def emitter
  @emitter
end

#max_tokensInteger (readonly)

Returns the token threshold above which compaction triggers.

Returns:

  • (Integer)

    the token threshold above which compaction triggers



33
34
35
# File 'lib/ruby_pi/context/compaction.rb', line 33

def max_tokens
  @max_tokens
end

#preserve_last_nInteger (readonly)

Returns number of recent messages always preserved.

Returns:

  • (Integer)

    number of recent messages always preserved



39
40
41
# File 'lib/ruby_pi/context/compaction.rb', line 39

def preserve_last_n
  @preserve_last_n
end

#summary_modelRubyPi::LLM::BaseProvider (readonly)

Returns the model used to generate summaries.

Returns:



36
37
38
# File 'lib/ruby_pi/context/compaction.rb', line 36

def summary_model
  @summary_model
end

Instance Method Details

#build_compacted_history(summary, preserved) ⇒ Array<Hash>

Builds the compacted history: a summary message followed by the preserved tail.

The summary becomes the FIRST message of the compacted history, so it must satisfy the strictest provider constraints (Anthropic):

1. The summary role MUST NOT be :system — that would overwrite the
   real system prompt on Anthropic, which promotes the last :system
   message to the top-level `system:` parameter.
2. The first message MUST use role :user.
3. Consecutive same-role messages are rejected.

A :user summary satisfies (1) and (2). For (3): the orphan-strip above guarantees the first preserved message is :assistant, :user, or absent (never :tool). When it is :assistant or absent, a standalone :user summary alternates correctly. When it is :user, a separate :user summary would create two consecutive user messages, so we instead merge the summary text into that existing user message — keeping the first message a single :user message with no role collision.

Parameters:

  • summary (String)

    the generated summary text

  • preserved (Array<Hash>)

    the preserved tail of messages

Returns:

  • (Array<Hash>)

    the compacted history



138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/ruby_pi/context/compaction.rb', line 138

def build_compacted_history(summary, preserved)
  summary_text = "[Conversation Summary]\n#{summary}"
  first_preserved = preserved.first

  if first_preserved && first_preserved[:role] == :user
    merged = first_preserved.dup
    merged[:content] = "#{summary_text}\n\n#{first_preserved[:content]}"
    [merged] + preserved.drop(1)
  else
    [{ role: :user, content: summary_text }] + preserved
  end
end

#compact(messages, system_prompt) ⇒ Array<Hash>?

Compacts the message history if the estimated token count exceeds the threshold. Returns the compacted messages array, or nil if no compaction was needed.

The compaction process:

  1. Estimate total tokens for system_prompt + all messages.

  2. If under threshold, return nil (no compaction needed).

  3. Split messages into “droppable” (older) and “preserved” (recent).

  4. Summarize the droppable messages via the summary model.

  5. Return a new array: [summary_message] + preserved_messages.

Parameters:

  • messages (Array<Hash>)

    the current conversation history

  • system_prompt (String)

    the system prompt (included in estimate)

Returns:

  • (Array<Hash>, nil)

    compacted messages, or nil if not needed



72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
# File 'lib/ruby_pi/context/compaction.rb', line 72

def compact(messages, system_prompt)
  total_tokens = estimate_tokens(system_prompt, messages)
  return nil if total_tokens <= @max_tokens

  # Split into messages to summarize and messages to keep
  preserved_count = [@preserve_last_n, messages.size].min
  droppable = messages[0...(messages.size - preserved_count)].dup
  preserved = messages[(messages.size - preserved_count)..].dup

  # If there's nothing to drop, we can't compact further
  return nil if droppable.empty?

  # Anthropic and OpenAI both require every tool_result / tool message
  # to reference a tool_use / tool_call from a preceding assistant
  # message. If we summarize the assistant turn that originated a tool
  # call but keep the matching tool_result, the API rejects the
  # request with "tool_result without preceding tool_use".
  #
  # When the boundary between droppable and preserved cuts mid-exchange,
  # preserved can start with one or more orphan :tool messages whose
  # matching assistant turn is in droppable. Strip those off the head of
  # preserved and move them into droppable so they are summarized away
  # rather than sent. Because the originating assistant message is older,
  # it is already in droppable, so the pair stays together there — there
  # is no mirror case to handle (once a tool result is moved across, its
  # assistant is never left stranded on the preserved side).
  while preserved.first && preserved.first[:role] == :tool
    droppable << preserved.shift
  end

  # The orphan-strip only moves messages INTO droppable, so droppable
  # cannot have shrunk; it is still non-empty here. preserved, however,
  # may now be empty (the whole window was tool results) — the summary
  # construction below handles that case.

  # Generate a summary of the dropped messages
  summary = summarize(droppable)

  # Emit compaction event if an emitter is available
  @emitter&.emit(:compaction, dropped_count: droppable.size, summary: summary)

  build_compacted_history(summary, preserved)
end

#estimate_tokens(system_prompt, messages) ⇒ Integer

Estimates the total token count for a system prompt and message array using the character-based heuristic.

Parameters:

  • system_prompt (String)

    the system prompt text

  • messages (Array<Hash>)

    conversation messages

Returns:

  • (Integer)

    estimated token count



157
158
159
160
161
162
163
164
165
166
167
# File 'lib/ruby_pi/context/compaction.rb', line 157

def estimate_tokens(system_prompt, messages)
  total_chars = system_prompt.to_s.length

  messages.each do |msg|
    total_chars += msg[:content].to_s.length
    # Account for role and structural overhead (~10 tokens per message)
    total_chars += 40
  end

  (total_chars.to_f / CHARS_PER_TOKEN).ceil
end