Class: Clacky::MessageCompressor

Inherits:

Object

Object
Clacky::MessageCompressor

show all

Defined in:: lib/clacky/agent/message_compressor.rb

Overview

Message compressor using Insert-then-Compress strategy

New Strategy: Instead of creating a separate API call for compression, we insert a compression instruction into the current conversation flow. This allows us to reuse the existing cache (system prompt + tools) and only pay for processing the new compression instruction.

Flow:

Agent detects compression threshold is reached
Compressor builds a compression instruction message
Agent inserts this message and calls LLM (with cache reuse!)
LLM returns compressed summary
Compressor rebuilds message list: system + summary + recent messages
Agent continues with new message list (cache will rebuild from here)

Benefits:

Compression call reuses existing cache (huge token savings)
Only one cache rebuild after compression (vs two with old approach)

Constant Summary collapse

COMPRESSION_PROMPT =

<<~PROMPT.freeze
  ═══════════════════════════════════════════════════════════════
  CRITICAL: TASK CHANGE - MEMORY COMPRESSION MODE
  ═══════════════════════════════════════════════════════════════
  The conversation above has ENDED. You are now in MEMORY COMPRESSION MODE.

  CRITICAL INSTRUCTIONS - READ CAREFULLY:

  1. This is NOT a continuation of the conversation
  2. DO NOT respond to any requests in the conversation above
  3. DO NOT call ANY tools or functions
  4. DO NOT use tool_calls in your response
  5. Your response MUST be PURE TEXT ONLY

  YOUR ONLY TASK: Create a comprehensive summary of the conversation above.

  REQUIRED RESPONSE FORMAT:
  First output a <topics> line listing 3-6 key topic phrases (comma-separated, concise).
  Then output a <continues_previous> line: "true" if this conversation is a direct
  continuation of the SAME task/topic as the PREVIOUS chunk shown below, "false" if it
  has moved on to a different task or topic. If there is no previous chunk, output "false".
  Then output the full summary wrapped in <summary> tags.

  Example format:
  <topics>Rails setup, database config, deploy pipeline, Tailwind CSS</topics>
  <continues_previous>false</continues_previous>
  <summary>
  ...full summary text...
  </summary>

  Focus on:
  - User's explicit requests and intents
  - Key technical concepts and code changes
  - Files examined and modified
  - Errors encountered and fixes applied
  - Current work status and pending tasks

  Begin your response NOW. Remember: PURE TEXT only, starting with <topics> then
  <continues_previous> then <summary>.
PROMPT

Instance Method Summary collapse

#build_compression_message(messages, recent_messages: [], previous_topics: nil) ⇒ Hash
Generate compression instruction message to be inserted into conversation This enables cache reuse by using the same API call with tools.
#initialize(client, model: nil) ⇒ MessageCompressor constructor
A new instance of MessageCompressor.
#parse_compressed_result(result, chunk_path: nil, topics: nil, previous_chunks: []) ⇒ Object
#parse_continues_previous(content) ⇒ Object
Parse the <continues_previous> tag.
#parse_topics(content) ⇒ Object
Parse topics tag from compressed content.
#rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil, topics: nil, previous_chunks: [], pulled_back_messages: []) ⇒ Array<Hash>
Parse LLM response and rebuild message list with compression.

Constructor Details

#initialize(client, model: nil) ⇒ `MessageCompressor`

Returns a new instance of MessageCompressor.

# File 'lib/clacky/agent/message_compressor.rb', line 65

def initialize(client, model: nil)
  @client = client
  @model = model
end

Instance Method Details

#build_compression_message(messages, recent_messages: [], previous_topics: nil) ⇒ `Hash`

Generate compression instruction message to be inserted into conversation This enables cache reuse by using the same API call with tools

SIMPLIFIED APPROACH:

Don't duplicate conversation history in the compression message
LLM can already see all messages, just ask it to compress
Keep the instruction small for better cache efficiency

Parameters:

messages (Array<Hash>) —
Original conversation messages
recent_messages (Array<Hash>) (defaults to: []) —
Recent messages to keep uncompressed (optional)
previous_topics (String, nil) (defaults to: nil) —
Topics of the most recent chunk on disk, shown to the LLM so it can decide whether the current conversation is a continuation (drives the <continues_previous> output for chunk merging).

Returns:

(Hash) —
Compression instruction message to insert, or nil if nothing to compress

# File 'lib/clacky/agent/message_compressor.rb', line 84

def build_compression_message(messages, recent_messages: [], previous_topics: nil)
  # Get messages to compress (exclude system message and recent messages)
  messages_to_compress = messages.reject { |m| m[:role] == "system" || recent_messages.include?(m) }

  # If nothing to compress, return nil
  return nil if messages_to_compress.empty?

  content = COMPRESSION_PROMPT
  if previous_topics && !previous_topics.strip.empty?
    content = "#{COMPRESSION_PROMPT}\n\nPREVIOUS CHUNK TOPICS (for <continues_previous> judgement): #{previous_topics}"
  end

  {
    role: "user",
    content: content,
    system_injected: true
  }
end

#parse_compressed_result(result, chunk_path: nil, topics: nil, previous_chunks: []) ⇒ `Object`

# File 'lib/clacky/agent/message_compressor.rb', line 166

def parse_compressed_result(result, chunk_path: nil, topics: nil, previous_chunks: [])
  # Return the compressed result as a single user message (role: "user").
  #
  # Why role:"user" instead of "assistant":
  #   When all original user messages get archived into the chunk during compression
  #   (e.g. a long single-turn `/slash` task), the rebuilt history can end up as
  #   `system → assistant(summary) → assistant(tool_calls) → tool → …` with NO user
  #   message anywhere. Strict providers (notably DeepSeek V4 thinking mode) reject
  #   this as a malformed turn structure with a misleading
  #   "reasoning_content must be passed back" 400 error.
  #
  # Marking it as a user message gives the conversation a valid turn boundary.
  # `system_injected: true` ensures the UI's replay_history still hides it from
  # the chat panel (the real-user filter excludes system_injected messages), while
  # INTERNAL_FIELDS in MessageHistory strips the marker before the API payload is
  # built — so DeepSeek/OpenAI/Anthropic only see a plain `{role:"user", content:…}`.
  #
  # The `compressed_summary: true` flag is preserved so that replay_history still
  # routes this message through the chunk-expansion path (which keys off that flag,
  # not the role).
  #
  # @param topics [String, nil] Short topic description extracted from <topics> tag
  # @param previous_chunks [Array<Hash>] Info about older chunk files
  #   Each hash: { basename:, path:, topics: }
  content = result.to_s.strip

  if content.empty?
    []
  else
    # Strip out the <topics> and <continues_previous> blocks — they're
    # metadata for chunk handling, not for AI context.
    content_without_topics = content.gsub(/<topics>.*?<\/topics>\n*/m, "")
                                    .gsub(/<continues_previous>.*?<\/continues_previous>\n*/m, "")
                                    .strip

    # Build previous chunks index section — links to older chunk files so the AI
    # can find earlier conversations without keeping all prior compressed_summary
    # messages in the active history. Shows newest chunks first (reverse order),
    # capped at 10 to keep the message size bounded.
    previous_chunks_section = ""
    if previous_chunks.any?
      max_visible = 10
      visible = previous_chunks.last(max_visible).reverse
      older_count = previous_chunks.size - visible.size

      previous_chunks_section = "\n\n---\n📁 **Previous chunks (newest first):**\n"
      visible.each do |pc|
        topic_str = pc[:topics] ? " — #{pc[:topics]}" : ""
        previous_chunks_section += "- `#{pc[:basename]}`#{topic_str}\n"
      end

      if older_count > 0
        oldest = previous_chunks.first
        previous_chunks_section += "- ... and #{older_count} older chunks back to `#{oldest[:basename]}`\n"
      end

      previous_chunks_section += "_Use `file_reader` to recall details from these chunks._"
    end

    # Inject chunk anchor so AI knows where to find original conversation for THIS chunk
    anchor = ""
    if chunk_path
      anchor = "\n\n---\n📁 **Current chunk archived at:** `#{chunk_path}`\n" \
               "_Use `file_reader` tool to recall details from this chunk._"
    end

    # Prefix lets the model recognise this is injected context, not a user utterance.
    # Order: summary → previous chunks → current anchor (chronological)
    framed_content = "[Compressed conversation summary — previous turns archived]\n\n" \
                     "#{content_without_topics}" \
                     "#{previous_chunks_section}" \
                     "#{anchor}"

    [{
      role: "user",
      content: framed_content,
      compressed_summary: true,
      chunk_path: chunk_path,
      topics: topics,
      system_injected: true
    }]
  end
end

#parse_continues_previous(content) ⇒ `Object`

Parse the <continues_previous> tag. Returns true only when the LLM explicitly says "true"; missing tag or any other value → false. This conservative default ensures we never merge unless the model is sure.

# File 'lib/clacky/agent/message_compressor.rb', line 160

def parse_continues_previous(content)
  return false if content.nil? || content.to_s.empty?
  m = content.to_s.match(/<continues_previous>(.*?)<\/continues_previous>/m)
  m ? m[1].strip.downcase == "true" : false
end

#parse_topics(content) ⇒ `Object`

Parse topics tag from compressed content. Returns the topics string if found, nil otherwise. e.g. "Rails setup, database config" → "Rails setup, database config"

# File 'lib/clacky/agent/message_compressor.rb', line 151

def parse_topics(content)
  return nil if content.nil? || content.to_s.empty?
  m = content.to_s.match(/<topics>(.*?)<\/topics>/m)
  m ? m[1].strip : nil
end

#rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil, topics: nil, previous_chunks: [], pulled_back_messages: []) ⇒ `Array<Hash>`

Parse LLM response and rebuild message list with compression