Class: Clacky::MessageCompressor

Inherits:
Object
  • Object
show all
Defined in:
lib/clacky/agent/message_compressor.rb

Overview

Message compressor using Insert-then-Compress strategy

New Strategy: Instead of creating a separate API call for compression, we insert a compression instruction into the current conversation flow. This allows us to reuse the existing cache (system prompt + tools) and only pay for processing the new compression instruction.

Flow:

  1. Agent detects compression threshold is reached
  2. Compressor builds a compression instruction message
  3. Agent inserts this message and calls LLM (with cache reuse!)
  4. LLM returns compressed summary
  5. Compressor rebuilds message list: system + summary + recent messages
  6. Agent continues with new message list (cache will rebuild from here)

Benefits:

  • Compression call reuses existing cache (huge token savings)
  • Only one cache rebuild after compression (vs two with old approach)

Constant Summary collapse

COMPRESSION_PROMPT =
<<~PROMPT.freeze
  ═══════════════════════════════════════════════════════════════
  CRITICAL: TASK CHANGE - MEMORY COMPRESSION MODE
  ═══════════════════════════════════════════════════════════════
  The conversation above has ENDED. You are now in MEMORY COMPRESSION MODE.

  CRITICAL INSTRUCTIONS - READ CAREFULLY:

  1. This is NOT a continuation of the conversation
  2. DO NOT respond to any requests in the conversation above
  3. DO NOT call ANY tools or functions
  4. DO NOT use tool_calls in your response
  5. Your response MUST be PURE TEXT ONLY

  YOUR ONLY TASK: Create a comprehensive summary of the conversation above.

  REQUIRED RESPONSE FORMAT:
  First output a <topics> line listing 3-6 key topic phrases (comma-separated, concise).
  Then output a <continues_previous> line: "true" if this conversation is a direct
  continuation of the SAME task/topic as the PREVIOUS chunk shown below, "false" if it
  has moved on to a different task or topic. If there is no previous chunk, output "false".
  Then output the full summary wrapped in <summary> tags.

  Example format:
  <topics>Rails setup, database config, deploy pipeline, Tailwind CSS</topics>
  <continues_previous>false</continues_previous>
  <summary>
  ...full summary text...
  </summary>

  Focus on:
  - User's explicit requests and intents
  - Key technical concepts and code changes
  - Files examined and modified
  - Errors encountered and fixes applied
  - Current work status and pending tasks

  Begin your response NOW. Remember: PURE TEXT only, starting with <topics> then
  <continues_previous> then <summary>.
PROMPT

Instance Method Summary collapse

Constructor Details

#initialize(client, model: nil) ⇒ MessageCompressor

Returns a new instance of MessageCompressor.



65
66
67
68
# File 'lib/clacky/agent/message_compressor.rb', line 65

def initialize(client, model: nil)
  @client = client
  @model = model
end

Instance Method Details

#build_compression_message(messages, recent_messages: [], previous_topics: nil) ⇒ Hash

Generate compression instruction message to be inserted into conversation This enables cache reuse by using the same API call with tools

SIMPLIFIED APPROACH:

  • Don't duplicate conversation history in the compression message
  • LLM can already see all messages, just ask it to compress
  • Keep the instruction small for better cache efficiency

Parameters:

  • messages (Array<Hash>)

    Original conversation messages

  • recent_messages (Array<Hash>) (defaults to: [])

    Recent messages to keep uncompressed (optional)

  • previous_topics (String, nil) (defaults to: nil)

    Topics of the most recent chunk on disk, shown to the LLM so it can decide whether the current conversation is a continuation (drives the <continues_previous> output for chunk merging).

Returns:

  • (Hash)

    Compression instruction message to insert, or nil if nothing to compress



84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# File 'lib/clacky/agent/message_compressor.rb', line 84

def build_compression_message(messages, recent_messages: [], previous_topics: nil)
  # Get messages to compress (exclude system message and recent messages)
  messages_to_compress = messages.reject { |m| m[:role] == "system" || recent_messages.include?(m) }

  # If nothing to compress, return nil
  return nil if messages_to_compress.empty?

  content = COMPRESSION_PROMPT
  if previous_topics && !previous_topics.strip.empty?
    content = "#{COMPRESSION_PROMPT}\n\nPREVIOUS CHUNK TOPICS (for <continues_previous> judgement): #{previous_topics}"
  end

  {
    role: "user",
    content: content,
    system_injected: true
  }
end

#parse_compressed_result(result, chunk_path: nil, topics: nil, previous_chunks: []) ⇒ Object



166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
# File 'lib/clacky/agent/message_compressor.rb', line 166

def parse_compressed_result(result, chunk_path: nil, topics: nil, previous_chunks: [])
  # Return the compressed result as a single user message (role: "user").
  #
  # Why role:"user" instead of "assistant":
  #   When all original user messages get archived into the chunk during compression
  #   (e.g. a long single-turn `/slash` task), the rebuilt history can end up as
  #   `system → assistant(summary) → assistant(tool_calls) → tool → …` with NO user
  #   message anywhere. Strict providers (notably DeepSeek V4 thinking mode) reject
  #   this as a malformed turn structure with a misleading
  #   "reasoning_content must be passed back" 400 error.
  #
  # Marking it as a user message gives the conversation a valid turn boundary.
  # `system_injected: true` ensures the UI's replay_history still hides it from
  # the chat panel (the real-user filter excludes system_injected messages), while
  # INTERNAL_FIELDS in MessageHistory strips the marker before the API payload is
  # built — so DeepSeek/OpenAI/Anthropic only see a plain `{role:"user", content:…}`.
  #
  # The `compressed_summary: true` flag is preserved so that replay_history still
  # routes this message through the chunk-expansion path (which keys off that flag,
  # not the role).
  #
  # @param topics [String, nil] Short topic description extracted from <topics> tag
  # @param previous_chunks [Array<Hash>] Info about older chunk files
  #   Each hash: { basename:, path:, topics: }
  content = result.to_s.strip

  if content.empty?
    []
  else
    # Strip out the <topics> and <continues_previous> blocks — they're
    # metadata for chunk handling, not for AI context.
    content_without_topics = content.gsub(/<topics>.*?<\/topics>\n*/m, "")
                                    .gsub(/<continues_previous>.*?<\/continues_previous>\n*/m, "")
                                    .strip

    # Build previous chunks index section — links to older chunk files so the AI
    # can find earlier conversations without keeping all prior compressed_summary
    # messages in the active history. Shows newest chunks first (reverse order),
    # capped at 10 to keep the message size bounded.
    previous_chunks_section = ""
    if previous_chunks.any?
      max_visible = 10
      visible = previous_chunks.last(max_visible).reverse
      older_count = previous_chunks.size - visible.size

      previous_chunks_section = "\n\n---\n📁 **Previous chunks (newest first):**\n"
      visible.each do |pc|
        topic_str = pc[:topics] ? "#{pc[:topics]}" : ""
        previous_chunks_section += "- `#{pc[:basename]}`#{topic_str}\n"
      end

      if older_count > 0
        oldest = previous_chunks.first
        previous_chunks_section += "- ... and #{older_count} older chunks back to `#{oldest[:basename]}`\n"
      end

      previous_chunks_section += "_Use `file_reader` to recall details from these chunks._"
    end

    # Inject chunk anchor so AI knows where to find original conversation for THIS chunk
    anchor = ""
    if chunk_path
      anchor = "\n\n---\n📁 **Current chunk archived at:** `#{chunk_path}`\n" \
               "_Use `file_reader` tool to recall details from this chunk._"
    end

    # Prefix lets the model recognise this is injected context, not a user utterance.
    # Order: summary → previous chunks → current anchor (chronological)
    framed_content = "[Compressed conversation summary — previous turns archived]\n\n" \
                     "#{content_without_topics}" \
                     "#{previous_chunks_section}" \
                     "#{anchor}"

    [{
      role: "user",
      content: framed_content,
      compressed_summary: true,
      chunk_path: chunk_path,
      topics: topics,
      system_injected: true
    }]
  end
end

#parse_continues_previous(content) ⇒ Object

Parse the <continues_previous> tag. Returns true only when the LLM explicitly says "true"; missing tag or any other value → false. This conservative default ensures we never merge unless the model is sure.



160
161
162
163
164
# File 'lib/clacky/agent/message_compressor.rb', line 160

def parse_continues_previous(content)
  return false if content.nil? || content.to_s.empty?
  m = content.to_s.match(/<continues_previous>(.*?)<\/continues_previous>/m)
  m ? m[1].strip.downcase == "true" : false
end

#parse_topics(content) ⇒ Object

Parse topics tag from compressed content. Returns the topics string if found, nil otherwise. e.g. "Rails setup, database config" → "Rails setup, database config"



151
152
153
154
155
# File 'lib/clacky/agent/message_compressor.rb', line 151

def parse_topics(content)
  return nil if content.nil? || content.to_s.empty?
  m = content.to_s.match(/<topics>(.*?)<\/topics>/m)
  m ? m[1].strip : nil
end

#rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil, topics: nil, previous_chunks: [], pulled_back_messages: []) ⇒ Array<Hash>

Parse LLM response and rebuild message list with compression

Parameters:

  • compressed_content (String)

    The compressed summary from LLM

  • original_messages (Array<Hash>)

    Original messages before compression

  • recent_messages (Array<Hash>)

    Recent messages to preserve

  • chunk_path (String, nil) (defaults to: nil)

    Path to the archived chunk MD file (if saved)

  • pulled_back_messages (Array<Hash>) (defaults to: [])

    Messages temporarily popped from the tail of @history before the compression LLM call (to free up token budget so the compression call itself doesn't overflow context). These are NOT discarded — they are reattached to the tail of the rebuilt history so recent task progress is preserved. Default: [] (normal compression path doesn't need this).

Returns:

  • (Array<Hash>)

    Rebuilt message list: system + compressed + recent + pulled_back



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/clacky/agent/message_compressor.rb', line 114

def rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil, topics: nil, previous_chunks: [], pulled_back_messages: [])
  # Find and preserve system message
  system_msg = original_messages.find { |m| m[:role] == "system" }

  # Parse the compressed result, embedding previous chunk references so the
  # new summary carries a complete index of all older archives. This avoids
  # keeping all prior compressed_summary messages in active history while
  # still giving the AI a path to find old conversations via file_reader.
  parsed_messages = parse_compressed_result(compressed_content,
                                            chunk_path: chunk_path,
                                            topics: topics,
                                            previous_chunks: previous_chunks)

  # If parsing fails or returns empty, raise error
  if parsed_messages.nil? || parsed_messages.empty?
    raise "LLM compression failed: unable to parse compressed messages"
  end

  # Return system message + compressed messages + recent messages + pulled_back messages.
  # Strip any system messages from recent_messages as a safety net —
  # get_recent_messages_with_tool_pairs already excludes them, but this
  # guard ensures we never end up with duplicate system prompts even if
  # the caller passes an unfiltered list.
  #
  # pulled_back_messages: messages that were temporarily popped from the tail
  # of @history before the compression LLM call (to free up token budget so
  # the compression call itself doesn't overflow context). They are reattached
  # here to preserve recent task progress.
  safe_recent = recent_messages.reject { |m| m[:role] == "system" }
  safe_pulled_back = pulled_back_messages.reject { |m| m[:role] == "system" }
  [system_msg, *parsed_messages, *safe_recent, *safe_pulled_back].compact
end