Class: Clacky::MessageCompressor
- Inherits:
-
Object
- Object
- Clacky::MessageCompressor
- Defined in:
- lib/clacky/agent/message_compressor.rb
Overview
Message compressor using Insert-then-Compress strategy
New Strategy: Instead of creating a separate API call for compression, we insert a compression instruction into the current conversation flow. This allows us to reuse the existing cache (system prompt + tools) and only pay for processing the new compression instruction.
Flow:
-
Agent detects compression threshold is reached
-
Compressor builds a compression instruction message
-
Agent inserts this message and calls LLM (with cache reuse!)
-
LLM returns compressed summary
-
Compressor rebuilds message list: system + summary + recent messages
-
Agent continues with new message list (cache will rebuild from here)
Benefits:
-
Compression call reuses existing cache (huge token savings)
-
Only one cache rebuild after compression (vs two with old approach)
Constant Summary collapse
- COMPRESSION_PROMPT =
<<~PROMPT.freeze ═══════════════════════════════════════════════════════════════ CRITICAL: TASK CHANGE - MEMORY COMPRESSION MODE ═══════════════════════════════════════════════════════════════ The conversation above has ENDED. You are now in MEMORY COMPRESSION MODE. CRITICAL INSTRUCTIONS - READ CAREFULLY: 1. This is NOT a continuation of the conversation 2. DO NOT respond to any requests in the conversation above 3. DO NOT call ANY tools or functions 4. DO NOT use tool_calls in your response 5. Your response MUST be PURE TEXT ONLY YOUR ONLY TASK: Create a comprehensive summary of the conversation above. REQUIRED RESPONSE FORMAT: First output a <topics> line listing 3-6 key topic phrases (comma-separated, concise). Then output the full summary wrapped in <summary> tags. Example format: <topics>Rails setup, database config, deploy pipeline, Tailwind CSS</topics> <summary> ...full summary text... </summary> Focus on: - User's explicit requests and intents - Key technical concepts and code changes - Files examined and modified - Errors encountered and fixes applied - Current work status and pending tasks Begin your response NOW. Remember: PURE TEXT only, starting with <topics> then <summary>. PROMPT
Instance Method Summary collapse
-
#build_compression_message(messages, recent_messages: []) ⇒ Hash
Generate compression instruction message to be inserted into conversation This enables cache reuse by using the same API call with tools.
-
#initialize(client, model: nil) ⇒ MessageCompressor
constructor
A new instance of MessageCompressor.
- #parse_compressed_result(result, chunk_path: nil) ⇒ Object
-
#parse_topics(content) ⇒ Object
Parse topics tag from compressed content.
-
#rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil) ⇒ Array<Hash>
Parse LLM response and rebuild message list with compression.
Constructor Details
#initialize(client, model: nil) ⇒ MessageCompressor
Returns a new instance of MessageCompressor.
60 61 62 63 |
# File 'lib/clacky/agent/message_compressor.rb', line 60 def initialize(client, model: nil) @client = client @model = model end |
Instance Method Details
#build_compression_message(messages, recent_messages: []) ⇒ Hash
Generate compression instruction message to be inserted into conversation This enables cache reuse by using the same API call with tools
SIMPLIFIED APPROACH:
-
Don’t duplicate conversation history in the compression message
-
LLM can already see all messages, just ask it to compress
-
Keep the instruction small for better cache efficiency
76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/clacky/agent/message_compressor.rb', line 76 def (, recent_messages: []) # Get messages to compress (exclude system message and recent messages) = .reject { |m| m[:role] == "system" || .include?(m) } # If nothing to compress, return nil return nil if .empty? # Simple compression instruction - LLM can see the history already { role: "user", content: COMPRESSION_PROMPT, system_injected: true } end |
#parse_compressed_result(result, chunk_path: nil) ⇒ Object
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
# File 'lib/clacky/agent/message_compressor.rb', line 127 def parse_compressed_result(result, chunk_path: nil) # Return the compressed result as a single user message (role: "user"). # # Why role:"user" instead of "assistant": # When all original user messages get archived into the chunk during compression # (e.g. a long single-turn `/slash` task), the rebuilt history can end up as # `system → assistant(summary) → assistant(tool_calls) → tool → …` with NO user # message anywhere. Strict providers (notably DeepSeek V4 thinking mode) reject # this as a malformed turn structure with a misleading # "reasoning_content must be passed back" 400 error. # # Marking it as a user message gives the conversation a valid turn boundary. # `system_injected: true` ensures the UI's replay_history still hides it from # the chat panel (the real-user filter excludes system_injected messages), while # INTERNAL_FIELDS in MessageHistory strips the marker before the API payload is # built — so DeepSeek/OpenAI/Anthropic only see a plain `{role:"user", content:…}`. # # The `compressed_summary: true` flag is preserved so that replay_history still # routes this message through the chunk-expansion path (which keys off that flag, # not the role). content = result.to_s.strip if content.empty? [] else # Strip out the <topics> block — it's metadata for the chunk file, not for AI context content_without_topics = content.gsub(/<topics>.*?<\/topics>\n*/m, "").strip # Inject chunk anchor so AI knows where to find original conversation if chunk_path anchor = "\n\n---\n📁 **Original conversation archived at:** `#{chunk_path}`\n" \ "_Use `file_reader` tool to recall details from this chunk._" content_without_topics = content_without_topics + anchor end # Prefix lets the model recognise this is injected context, not a user utterance. framed_content = "[Compressed conversation summary — previous turns archived]\n\n" \ "#{content_without_topics}" [{ role: "user", content: framed_content, compressed_summary: true, chunk_path: chunk_path, system_injected: true }] end end |
#parse_topics(content) ⇒ Object
Parse topics tag from compressed content. Returns the topics string if found, nil otherwise. e.g. “<topics>Rails setup, database config</topics>” → “Rails setup, database config”
122 123 124 125 |
# File 'lib/clacky/agent/message_compressor.rb', line 122 def parse_topics(content) m = content.match(/<topics>(.*?)<\/topics>/m) m ? m[1].strip : nil end |
#rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil) ⇒ Array<Hash>
Parse LLM response and rebuild message list with compression
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/clacky/agent/message_compressor.rb', line 97 def rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil) # Find and preserve system message system_msg = .find { |m| m[:role] == "system" } # Parse the compressed result = parse_compressed_result(compressed_content, chunk_path: chunk_path) # If parsing fails or returns empty, raise error if .nil? || .empty? raise "LLM compression failed: unable to parse compressed messages" end # Return system message + compressed messages + recent messages. # Strip any system messages from recent_messages as a safety net — # get_recent_messages_with_tool_pairs already excludes them, but this # guard ensures we never end up with duplicate system prompts even if # the caller passes an unfiltered list. safe_recent = .reject { |m| m[:role] == "system" } [system_msg, *, *safe_recent].compact end |