Class: RubyPi::Context::Compaction

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby_pi/context/compaction.rb

Overview

Manages context window size by summarizing older messages when the estimated token count exceeds a configurable threshold. The most recent N messages are always preserved to maintain conversational coherence.

Examples:

Configuring compaction

compaction = RubyPi::Context::Compaction.new(
  max_tokens: 8000,
  summary_model: model,
  preserve_last_n: 4
)
compacted = compaction.compact(messages, system_prompt)

Constant Summary collapse

CHARS_PER_TOKEN =

Average characters per token — a rough heuristic that avoids the need for provider-specific tokenizers. Errs on the conservative side.

4

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction

Creates a new Compaction instance.

Parameters:

  • max_tokens (Integer) (defaults to: 8000)

    trigger compaction above this token estimate (default: 8000)

  • summary_model (RubyPi::LLM::BaseProvider)

    model for summarization

  • preserve_last_n (Integer) (defaults to: 4)

    always keep the last N messages (default: 4)



51
52
53
54
55
56
# File 'lib/ruby_pi/context/compaction.rb', line 51

def initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4)
  @max_tokens = max_tokens
  @summary_model = summary_model
  @preserve_last_n = preserve_last_n
  @emitter = nil
end

Instance Attribute Details

#emitter#emit?

Returns optional event emitter for :compaction events.

Returns:

  • (#emit, nil)

    optional event emitter for :compaction events



42
43
44
# File 'lib/ruby_pi/context/compaction.rb', line 42

def emitter
  @emitter
end

#max_tokensInteger (readonly)

Returns the token threshold above which compaction triggers.

Returns:

  • (Integer)

    the token threshold above which compaction triggers



33
34
35
# File 'lib/ruby_pi/context/compaction.rb', line 33

def max_tokens
  @max_tokens
end

#preserve_last_nInteger (readonly)

Returns number of recent messages always preserved.

Returns:

  • (Integer)

    number of recent messages always preserved



39
40
41
# File 'lib/ruby_pi/context/compaction.rb', line 39

def preserve_last_n
  @preserve_last_n
end

#summary_modelRubyPi::LLM::BaseProvider (readonly)

Returns the model used to generate summaries.

Returns:



36
37
38
# File 'lib/ruby_pi/context/compaction.rb', line 36

def summary_model
  @summary_model
end

Instance Method Details

#compact(messages, system_prompt) ⇒ Array<Hash>?

Compacts the message history if the estimated token count exceeds the threshold. Returns the compacted messages array, or nil if no compaction was needed.

The compaction process:

  1. Estimate total tokens for system_prompt + all messages.

  2. If under threshold, return nil (no compaction needed).

  3. Split messages into “droppable” (older) and “preserved” (recent).

  4. Summarize the droppable messages via the summary model.

  5. Return a new array: [summary_message] + preserved_messages.

Parameters:

  • messages (Array<Hash>)

    the current conversation history

  • system_prompt (String)

    the system prompt (included in estimate)

Returns:

  • (Array<Hash>, nil)

    compacted messages, or nil if not needed



72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/ruby_pi/context/compaction.rb', line 72

def compact(messages, system_prompt)
  total_tokens = estimate_tokens(system_prompt, messages)
  return nil if total_tokens <= @max_tokens

  # Split into messages to summarize and messages to keep
  preserved_count = [@preserve_last_n, messages.size].min
  droppable = messages[0...(messages.size - preserved_count)].dup
  preserved = messages[(messages.size - preserved_count)..].dup

  # If there's nothing to drop, we can't compact further
  return nil if droppable.empty?

  # Anthropic and OpenAI both require every tool_result / tool message
  # to reference a tool_use / tool_call from a preceding assistant
  # message. If we summarize the assistant turn that originated a tool
  # call but keep the matching tool_result, the API rejects the
  # request with "tool_result without preceding tool_use".
  #
  # The boundary between droppable and preserved can split a tool
  # exchange in two ways:
  #   (a) preserved starts with one or more :tool messages whose
  #       matching assistant turn is in droppable. Strip those
  #       orphan tool messages from the head of preserved (move
  #       them into droppable so they are summarized, not sent).
  #   (b) the last droppable message is an :assistant with tool_calls,
  #       but its matching :tool result(s) are in preserved. Pull
  #       that assistant message back into preserved so the pair
  #       stays intact.
  #
  # We apply (a) first: it's the common case (preserve_last_n=4 cuts
  # mid-pair, leaving a stranded tool message). Then (b) catches the
  # mirror case.
  while preserved.first && preserved.first[:role] == :tool
    droppable << preserved.shift
  end

  if droppable.last &&
     droppable.last[:role] == :assistant &&
     droppable.last[:tool_calls].is_a?(Array) &&
     !droppable.last[:tool_calls].empty? &&
     preserved.first && preserved.first[:role] == :tool
    preserved.unshift(droppable.pop)
  end

  # After the boundary fix-ups, droppable may have become empty.
  return nil if droppable.empty?

  # Generate a summary of the dropped messages
  summary = summarize(droppable)

  # Emit compaction event if an emitter is available
  @emitter&.emit(:compaction, dropped_count: droppable.size, summary: summary)

  # Build the compacted history: summary message + preserved.
  #
  # The summary role MUST NOT be :system (that would overwrite the real
  # system prompt on Anthropic, which extracts the last :system message
  # as the top-level `system:` parameter).
  #
  # The summary role must also NOT match the role of the first preserved
  # message — consecutive same-role messages are rejected by Anthropic.
  # We pick :user when the next preserved message is :assistant, and
  # :assistant otherwise (covers :user, :tool, and an empty preserved).
  # On Anthropic, :tool messages become role :user with tool_result
  # blocks, so :assistant is the safe choice when the next message is
  # :tool too.
  first_preserved_role = preserved.first&.dig(:role)
  summary_role = first_preserved_role == :assistant ? :user : :assistant

  summary_message = {
    role: summary_role,
    content: "[Conversation Summary]\n#{summary}"
  }

  [summary_message] + preserved
end

#estimate_tokens(system_prompt, messages) ⇒ Integer

Estimates the total token count for a system prompt and message array using the character-based heuristic.

Parameters:

  • system_prompt (String)

    the system prompt text

  • messages (Array<Hash>)

    conversation messages

Returns:

  • (Integer)

    estimated token count



155
156
157
158
159
160
161
162
163
164
165
# File 'lib/ruby_pi/context/compaction.rb', line 155

def estimate_tokens(system_prompt, messages)
  total_chars = system_prompt.to_s.length

  messages.each do |msg|
    total_chars += msg[:content].to_s.length
    # Account for role and structural overhead (~10 tokens per message)
    total_chars += 40
  end

  (total_chars.to_f / CHARS_PER_TOKEN).ceil
end