Class: RubyPi::Context::Compaction

Inherits:

Object

Object
RubyPi::Context::Compaction

show all

Defined in:: lib/ruby_pi/context/compaction.rb

Overview

Manages context window size by summarizing older messages when the estimated token count exceeds a configurable threshold. The most recent N messages are always preserved to maintain conversational coherence.

Examples:

Configuring compaction

compaction = RubyPi::Context::Compaction.new(
  max_tokens: 8000,
  summary_model: model,
  preserve_last_n: 4
)
compacted = compaction.compact(messages, system_prompt)

Constant Summary collapse

CHARS_PER_TOKEN = Average characters per token — a rough heuristic that avoids the need for provider-specific tokenizers. Errs on the conservative side.

Instance Attribute Summary collapse

#emitter ⇒ #emit^?

Optional event emitter for :compaction events.
#max_tokens ⇒ Integer readonly

The token threshold above which compaction triggers.
#preserve_last_n ⇒ Integer readonly

Number of recent messages always preserved.
#summary_model ⇒ RubyPi::LLM::BaseProvider readonly

The model used to generate summaries.

Instance Method Summary collapse

#build_compacted_history(summary, preserved) ⇒ Array<Hash>

Builds the compacted history: a summary message followed by the preserved tail.
#compact(messages, system_prompt) ⇒ Array<Hash>^?

Compacts the message history if the estimated token count exceeds the threshold.
#estimate_tokens(system_prompt, messages) ⇒ Integer

Estimates the total token count for a system prompt and message array using the character-based heuristic.
#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction constructor

Creates a new Compaction instance.

Constructor Details

#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ `Compaction`

Creates a new Compaction instance.

Parameters:

max_tokens (Integer) (defaults to: 8000) —

trigger compaction above this token estimate (default: 8000)
summary_model (RubyPi::LLM::BaseProvider) —

model for summarization
preserve_last_n (Integer) (defaults to: 4) —

always keep the last N messages (default: 4)

# File 'lib/ruby_pi/context/compaction.rb', line 51

def initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4)
  @max_tokens = max_tokens
  @summary_model = summary_model
  @preserve_last_n = preserve_last_n
  @emitter = nil
end

Instance Attribute Details

#emitter ⇒ `#emit`^?

Returns optional event emitter for :compaction events.

Returns:

(#emit, nil) —

optional event emitter for :compaction events



42
43
44

# File 'lib/ruby_pi/context/compaction.rb', line 42

def emitter
  @emitter
end

#max_tokens ⇒ `Integer` (readonly)

Returns the token threshold above which compaction triggers.

Returns:

(Integer) —

the token threshold above which compaction triggers



33
34
35

# File 'lib/ruby_pi/context/compaction.rb', line 33

def max_tokens
  @max_tokens
end

#preserve_last_n ⇒ `Integer` (readonly)

Returns number of recent messages always preserved.

Returns:

(Integer) —

number of recent messages always preserved



39
40
41

# File 'lib/ruby_pi/context/compaction.rb', line 39

def preserve_last_n
  @preserve_last_n
end

#summary_model ⇒ `RubyPi::LLM::BaseProvider` (readonly)

Returns the model used to generate summaries.

Returns:

(RubyPi::LLM::BaseProvider) —

the model used to generate summaries



36
37
38

# File 'lib/ruby_pi/context/compaction.rb', line 36

def summary_model
  @summary_model
end

Instance Method Details

#build_compacted_history(summary, preserved) ⇒ `Array<Hash>`

Builds the compacted history: a summary message followed by the preserved tail.

The summary becomes the FIRST message of the compacted history, so it must satisfy the strictest provider constraints (Anthropic):

1. The summary role MUST NOT be :system — that would overwrite the
   real system prompt on Anthropic, which promotes the last :system
   message to the top-level `system:` parameter.
2. The first message MUST use role :user.
3. Consecutive same-role messages are rejected.

A :user summary satisfies (1) and (2). For (3): the orphan-strip above guarantees the first preserved message is :assistant, :user, or absent (never :tool). When it is :assistant or absent, a standalone :user summary alternates correctly. When it is :user, a separate :user summary would create two consecutive user messages, so we instead merge the summary text into that existing user message — keeping the first message a single :user message with no role collision.

Parameters:

summary (String) —

the generated summary text
preserved (Array<Hash>) —

the preserved tail of messages

Returns:

(Array<Hash>) —

the compacted history

# File 'lib/ruby_pi/context/compaction.rb', line 138

def build_compacted_history(summary, preserved)
  summary_text = "[Conversation Summary]\n#{summary}"
  first_preserved = preserved.first

  if first_preserved && first_preserved[:role] == :user
    merged = first_preserved.dup
    merged[:content] = "#{summary_text}\n\n#{first_preserved[:content]}"
    [merged] + preserved.drop(1)
  else
    [{ role: :user, content: summary_text }] + preserved
  end
end

#compact(messages, system_prompt) ⇒ `Array<Hash>`^?

Compacts the message history if the estimated token count exceeds the threshold. Returns the compacted messages array, or nil if no compaction was needed.

The compaction process:

Estimate total tokens for system_prompt + all messages.
If under threshold, return nil (no compaction needed).
Split messages into “droppable” (older) and “preserved” (recent).
Summarize the droppable messages via the summary model.
Return a new array: [summary_message] + preserved_messages.

Parameters:

messages (Array<Hash>) —

the current conversation history
system_prompt (String) —

the system prompt (included in estimate)

Returns:

(Array<Hash>, nil) —

compacted messages, or nil if not needed

# File 'lib/ruby_pi/context/compaction.rb', line 72

def compact(messages, system_prompt)
  total_tokens = estimate_tokens(system_prompt, messages)
  return nil if total_tokens <= @max_tokens

  # Split into messages to summarize and messages to keep
  preserved_count = [@preserve_last_n, messages.size].min
  droppable = messages[0...(messages.size - preserved_count)].dup
  preserved = messages[(messages.size - preserved_count)..].dup

  # If there's nothing to drop, we can't compact further
  return nil if droppable.empty?

  # Anthropic and OpenAI both require every tool_result / tool message
  # to reference a tool_use / tool_call from a preceding assistant
  # message. If we summarize the assistant turn that originated a tool
  # call but keep the matching tool_result, the API rejects the
  # request with "tool_result without preceding tool_use".
  #
  # When the boundary between droppable and preserved cuts mid-exchange,
  # preserved can start with one or more orphan :tool messages whose
  # matching assistant turn is in droppable. Strip those off the head of
  # preserved and move them into droppable so they are summarized away
  # rather than sent. Because the originating assistant message is older,
  # it is already in droppable, so the pair stays together there — there
  # is no mirror case to handle (once a tool result is moved across, its
  # assistant is never left stranded on the preserved side).
  while preserved.first && preserved.first[:role] == :tool
    droppable << preserved.shift
  end

  # The orphan-strip only moves messages INTO droppable, so droppable
  # cannot have shrunk; it is still non-empty here. preserved, however,
  # may now be empty (the whole window was tool results) — the summary
  # construction below handles that case.

  # Generate a summary of the dropped messages
  summary = summarize(droppable)

  # Emit compaction event if an emitter is available
  @emitter&.emit(:compaction, dropped_count: droppable.size, summary: summary)

  build_compacted_history(summary, preserved)
end

#estimate_tokens(system_prompt, messages) ⇒ `Integer`

Estimates the total token count for a system prompt and message array using the character-based heuristic.

Parameters:

system_prompt (String) —

the system prompt text
messages (Array<Hash>) —

conversation messages

Returns:

(Integer) —

estimated token count

# File 'lib/ruby_pi/context/compaction.rb', line 157

def estimate_tokens(system_prompt, messages)
  total_chars = system_prompt.to_s.length

  messages.each do |msg|
    total_chars += msg[:content].to_s.length
    # Account for role and structural overhead (~10 tokens per message)
    total_chars += 40
  end

  (total_chars.to_f / CHARS_PER_TOKEN).ceil
end

Class: RubyPi::Context::Compaction

Overview

Examples:

Configuring compaction

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction

Instance Attribute Details

#emitter ⇒ #emit?

#max_tokens ⇒ Integer (readonly)

#preserve_last_n ⇒ Integer (readonly)

#summary_model ⇒ RubyPi::LLM::BaseProvider (readonly)

Instance Method Details

#build_compacted_history(summary, preserved) ⇒ Array<Hash>

#compact(messages, system_prompt) ⇒ Array<Hash>?

#estimate_tokens(system_prompt, messages) ⇒ Integer

#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ `Compaction`

#emitter ⇒ `#emit`^?

#max_tokens ⇒ `Integer` (readonly)

#preserve_last_n ⇒ `Integer` (readonly)

#summary_model ⇒ `RubyPi::LLM::BaseProvider` (readonly)

#build_compacted_history(summary, preserved) ⇒ `Array<Hash>`

#compact(messages, system_prompt) ⇒ `Array<Hash>`^?

#estimate_tokens(system_prompt, messages) ⇒ `Integer`