Class: RubyPi::Context::Compaction

Inherits:

Object

Object
RubyPi::Context::Compaction

show all

Defined in:: lib/ruby_pi/context/compaction.rb

Overview

Manages context window size by summarizing older messages when the estimated token count exceeds a configurable threshold. The most recent N messages are always preserved to maintain conversational coherence.

Examples:

Configuring compaction

compaction = RubyPi::Context::Compaction.new(
  max_tokens: 8000,
  summary_model: model,
  preserve_last_n: 4
)
compacted = compaction.compact(messages, system_prompt)

Constant Summary collapse

CHARS_PER_TOKEN = Average characters per token — a rough heuristic that avoids the need for provider-specific tokenizers. Errs on the conservative side.

Instance Attribute Summary collapse

#emitter ⇒ #emit^?

Optional event emitter for :compaction events.
#max_tokens ⇒ Integer readonly

The token threshold above which compaction triggers.
#preserve_last_n ⇒ Integer readonly

Number of recent messages always preserved.
#summary_model ⇒ RubyPi::LLM::BaseProvider readonly

The model used to generate summaries.

Instance Method Summary collapse

#compact(messages, system_prompt) ⇒ Array<Hash>^?

Compacts the message history if the estimated token count exceeds the threshold.
#estimate_tokens(system_prompt, messages) ⇒ Integer

Estimates the total token count for a system prompt and message array using the character-based heuristic.
#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction constructor

Creates a new Compaction instance.

Constructor Details

#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ `Compaction`

Creates a new Compaction instance.

Parameters:

max_tokens (Integer) (defaults to: 8000) —

trigger compaction above this token estimate (default: 8000)
summary_model (RubyPi::LLM::BaseProvider) —

model for summarization
preserve_last_n (Integer) (defaults to: 4) —

always keep the last N messages (default: 4)

# File 'lib/ruby_pi/context/compaction.rb', line 51

def initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4)
  @max_tokens = max_tokens
  @summary_model = summary_model
  @preserve_last_n = preserve_last_n
  @emitter = nil
end

Instance Attribute Details

#emitter ⇒ `#emit`^?

Returns optional event emitter for :compaction events.

Returns:

(#emit, nil) —

optional event emitter for :compaction events



42
43
44

# File 'lib/ruby_pi/context/compaction.rb', line 42

def emitter
  @emitter
end

#max_tokens ⇒ `Integer` (readonly)

Returns the token threshold above which compaction triggers.

Returns:

(Integer) —

the token threshold above which compaction triggers



33
34
35

# File 'lib/ruby_pi/context/compaction.rb', line 33

def max_tokens
  @max_tokens
end

#preserve_last_n ⇒ `Integer` (readonly)

Returns number of recent messages always preserved.

Returns:

(Integer) —

number of recent messages always preserved



39
40
41

# File 'lib/ruby_pi/context/compaction.rb', line 39

def preserve_last_n
  @preserve_last_n
end

#summary_model ⇒ `RubyPi::LLM::BaseProvider` (readonly)

Returns the model used to generate summaries.

Returns:

(RubyPi::LLM::BaseProvider) —

the model used to generate summaries



36
37
38

# File 'lib/ruby_pi/context/compaction.rb', line 36

def summary_model
  @summary_model
end

Instance Method Details

#compact(messages, system_prompt) ⇒ `Array<Hash>`^?

Compacts the message history if the estimated token count exceeds the threshold. Returns the compacted messages array, or nil if no compaction was needed.

The compaction process:

Estimate total tokens for system_prompt + all messages.
If under threshold, return nil (no compaction needed).
Split messages into “droppable” (older) and “preserved” (recent).
Summarize the droppable messages via the summary model.
Return a new array: [summary_message] + preserved_messages.

Parameters:

messages (Array<Hash>) —

the current conversation history
system_prompt (String) —

the system prompt (included in estimate)

Returns:

(Array<Hash>, nil) —

compacted messages, or nil if not needed

# File 'lib/ruby_pi/context/compaction.rb', line 72

def compact(messages, system_prompt)
  total_tokens = estimate_tokens(system_prompt, messages)
  return nil if total_tokens <= @max_tokens

  # Split into messages to summarize and messages to keep
  preserved_count = [@preserve_last_n, messages.size].min
  droppable = messages[0...(messages.size - preserved_count)].dup
  preserved = messages[(messages.size - preserved_count)..].dup

  # If there's nothing to drop, we can't compact further
  return nil if droppable.empty?

  # Anthropic and OpenAI both require every tool_result / tool message
  # to reference a tool_use / tool_call from a preceding assistant
  # message. If we summarize the assistant turn that originated a tool
  # call but keep the matching tool_result, the API rejects the
  # request with "tool_result without preceding tool_use".
  #
  # The boundary between droppable and preserved can split a tool
  # exchange in two ways:
  #   (a) preserved starts with one or more :tool messages whose
  #       matching assistant turn is in droppable. Strip those
  #       orphan tool messages from the head of preserved (move
  #       them into droppable so they are summarized, not sent).
  #   (b) the last droppable message is an :assistant with tool_calls,
  #       but its matching :tool result(s) are in preserved. Pull
  #       that assistant message back into preserved so the pair
  #       stays intact.
  #
  # We apply (a) first: it's the common case (preserve_last_n=4 cuts
  # mid-pair, leaving a stranded tool message). Then (b) catches the
  # mirror case.
  while preserved.first && preserved.first[:role] == :tool
    droppable << preserved.shift
  end

  if droppable.last &&
     droppable.last[:role] == :assistant &&
     droppable.last[:tool_calls].is_a?(Array) &&
     !droppable.last[:tool_calls].empty? &&
     preserved.first && preserved.first[:role] == :tool
    preserved.unshift(droppable.pop)
  end

  # After the boundary fix-ups, droppable may have become empty.
  return nil if droppable.empty?

  # Generate a summary of the dropped messages
  summary = summarize(droppable)

  # Emit compaction event if an emitter is available
  @emitter&.emit(:compaction, dropped_count: droppable.size, summary: summary)

  # Build the compacted history: summary message + preserved.
  #
  # The summary role MUST NOT be :system (that would overwrite the real
  # system prompt on Anthropic, which extracts the last :system message
  # as the top-level `system:` parameter).
  #
  # The summary role must also NOT match the role of the first preserved
  # message — consecutive same-role messages are rejected by Anthropic.
  # We pick :user when the next preserved message is :assistant, and
  # :assistant otherwise (covers :user, :tool, and an empty preserved).
  # On Anthropic, :tool messages become role :user with tool_result
  # blocks, so :assistant is the safe choice when the next message is
  # :tool too.
  first_preserved_role = preserved.first&.dig(:role)
  summary_role = first_preserved_role == :assistant ? :user : :assistant

  summary_message = {
    role: summary_role,
    content: "[Conversation Summary]\n#{summary}"
  }

  [summary_message] + preserved
end

#estimate_tokens(system_prompt, messages) ⇒ `Integer`

Estimates the total token count for a system prompt and message array using the character-based heuristic.

Parameters:

system_prompt (String) —

the system prompt text
messages (Array<Hash>) —

conversation messages

Returns:

(Integer) —

estimated token count

# File 'lib/ruby_pi/context/compaction.rb', line 155

def estimate_tokens(system_prompt, messages)
  total_chars = system_prompt.to_s.length

  messages.each do |msg|
    total_chars += msg[:content].to_s.length
    # Account for role and structural overhead (~10 tokens per message)
    total_chars += 40
  end

  (total_chars.to_f / CHARS_PER_TOKEN).ceil
end

Class: RubyPi::Context::Compaction

Overview

Examples:

Configuring compaction

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ Compaction

Instance Attribute Details

#emitter ⇒ #emit?

#max_tokens ⇒ Integer (readonly)

#preserve_last_n ⇒ Integer (readonly)

#summary_model ⇒ RubyPi::LLM::BaseProvider (readonly)

Instance Method Details

#compact(messages, system_prompt) ⇒ Array<Hash>?

#estimate_tokens(system_prompt, messages) ⇒ Integer

#initialize(max_tokens: 8000, summary_model:, preserve_last_n: 4) ⇒ `Compaction`

#emitter ⇒ `#emit`^?

#max_tokens ⇒ `Integer` (readonly)

#preserve_last_n ⇒ `Integer` (readonly)

#summary_model ⇒ `RubyPi::LLM::BaseProvider` (readonly)

#compact(messages, system_prompt) ⇒ `Array<Hash>`^?

#estimate_tokens(system_prompt, messages) ⇒ `Integer`