Class: RobotLab::HistoryCompressor
- Inherits:
-
Object
- Object
- RobotLab::HistoryCompressor
- Defined in:
- lib/robot_lab/history_compressor.rb
Overview
Compresses a robot’s conversation history using TF-IDF relevance scoring.
Old conversation turns are tiered against the most recent context:
-
High relevance (score >= keep_threshold) → kept verbatim
-
Medium relevance (drop_threshold..keep_threshold) → summarized or dropped
-
Low relevance (score < drop_threshold) → dropped
System messages and tool call/result messages are always preserved. The most recent recent_turns user+assistant pairs are also always kept.
Requires the optional ‘classifier’ gem (~> 2.3).
Defined Under Namespace
Classes: SUMMARY_STRUCT
Constant Summary collapse
- MIN_SCORE_LENGTH =
Minimum text length (characters) to score; shorter messages are kept as-is.
20
Instance Method Summary collapse
-
#call ⇒ Array
Execute compression and return the new message array.
-
#initialize(messages:, recent_turns:, keep_threshold:, drop_threshold:, summarizer:) ⇒ HistoryCompressor
constructor
A new instance of HistoryCompressor.
Constructor Details
#initialize(messages:, recent_turns:, keep_threshold:, drop_threshold:, summarizer:) ⇒ HistoryCompressor
Returns a new instance of HistoryCompressor.
46 47 48 49 50 51 52 53 54 55 56 57 |
# File 'lib/robot_lab/history_compressor.rb', line 46 def initialize(messages:, recent_turns:, keep_threshold:, drop_threshold:, summarizer:) if keep_threshold <= drop_threshold raise ArgumentError, "keep_threshold (#{keep_threshold}) must be greater than drop_threshold (#{drop_threshold})" end @messages = @recent_turns = recent_turns @keep_threshold = keep_threshold @drop_threshold = drop_threshold @summarizer = summarizer end |
Instance Method Details
#call ⇒ Array
Execute compression and return the new message array.
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
# File 'lib/robot_lab/history_compressor.rb', line 62 def call return @messages if @messages.empty? # Classify each message index as pinned (always keep) or scorable pinned_indices = [] scorable_indices = [] @messages.each_with_index do |msg, idx| if (msg) pinned_indices << idx else scorable_indices << idx end end # Nothing scorable, or everything fits inside the recent window: return as-is return @messages if scorable_indices.empty? return @messages if scorable_indices.size <= @recent_turns * 2 recent_count = @recent_turns * 2 compressible = scorable_indices[0..-(recent_count + 1)] recent = scorable_indices[-recent_count..] return @messages if compressible.nil? || compressible.empty? # Build reference vector from the recent window using stemmed term frequencies. # Term frequencies (no IDF) are used because IDF on a topic-focused corpus # would suppress the very terms that indicate relevance to that topic. recent_texts = recent.filter_map { |i| extract_text(@messages[i]) } .reject { |t| t.strip.length < MIN_SCORE_LENGTH } # No meaningful recent text → cannot score; return unchanged return @messages if recent_texts.empty? TextAnalysis.require_classifier! recent_vectors = recent_texts.map { |t| TextAnalysis.l2_normalize(t.word_hash) } reference = mean_vector(recent_vectors) # Decide action for each compressible message actions = {} compressible.each do |idx| actions[idx] = score_action(reference, @messages[idx]) end # Reconstruct the message array in original order build_result(actions) end |