Class: ClaudeMemory::Sweep::Maintenance
- Inherits:
-
Object
- Object
- ClaudeMemory::Sweep::Maintenance
- Defined in:
- lib/claude_memory/sweep/maintenance.rb
Overview
Clean separation of individual maintenance operations from Sweeper’s budget-management orchestration. Each method performs a single operation and returns the count of affected records.
Source: QMD v2.0.1 Maintenance class pattern
Constant Summary collapse
- RESTORE_STOPWORDS =
Short / noise tokens dropped before Jaccard comparison. Intentionally minimal — we want conservative token extraction that still treats “Rails 8.0” and “Rails 8.1” as overlapping.
%w[for the and with via of in on to by is are].to_set.freeze
- RESTORE_JACCARD_THRESHOLD =
0.5- DEFAULT_CONFIG =
{ proposed_fact_ttl_days: 14, disputed_fact_ttl_days: 30, content_retention_days: 30, mcp_tool_call_retention_days: 90 }.freeze
Instance Attribute Summary collapse
-
#store ⇒ Object
readonly
Returns the value of attribute store.
Instance Method Summary collapse
-
#backfill_vec_index(limit: 100) ⇒ Object
Backfill vector index for unindexed facts.
-
#checkpoint_wal ⇒ Object
Checkpoint the SQLite WAL file for compaction.
-
#cleanup_vec_expired(limit: 100) ⇒ Object
Remove vector embeddings for superseded/expired facts.
-
#dedupe_multi_value_facts ⇒ Object
Collapse duplicate multi-value facts.
-
#dedupe_open_conflicts(dry_run: false) ⇒ Hash
Deduplicate open conflicts that describe the same contradiction.
-
#expire_disputed_facts ⇒ Object
Expire disputed facts older than TTL.
-
#expire_proposed_facts ⇒ Object
Expire proposed facts older than TTL.
-
#fix_scope_leakage ⇒ Object
Fix scope leakage: facts whose ‘scope` column disagrees with the store they live in.
-
#initialize(store, config: {}) ⇒ Maintenance
constructor
A new instance of Maintenance.
-
#prune_old_content ⇒ Object
Delete old content items not referenced by any provenance.
-
#prune_old_mcp_tool_calls ⇒ Object
Delete MCP tool-call telemetry rows older than retention window.
-
#prune_orphaned_provenance ⇒ Object
Delete provenance records referencing non-existent facts.
-
#reclassify_references(dry_run: false) ⇒ Hash
Reclassify active facts currently labeled ‘convention` whose object text matches the ReferenceMaterialDetector heuristics.
-
#restore_multi_value_supersessions(predicate:, dry_run: false) ⇒ Hash
Restore superseded facts in a (subject, predicate) slot that were only superseded because of an obsolete single-value classification.
-
#vacuum ⇒ Object
Run SQLite VACUUM to reclaim space.
Constructor Details
#initialize(store, config: {}) ⇒ Maintenance
Returns a new instance of Maintenance.
25 26 27 28 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 25 def initialize(store, config: {}) @store = store @config = DEFAULT_CONFIG.merge(config) end |
Instance Attribute Details
#store ⇒ Object (readonly)
Returns the value of attribute store.
23 24 25 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 23 def store @store end |
Instance Method Details
#backfill_vec_index(limit: 100) ⇒ Object
Backfill vector index for unindexed facts. Returns: Integer count of backfilled embeddings (0 if unavailable)
149 150 151 152 153 154 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 149 def backfill_vec_index(limit: 100) with_vec_index do |vec_index| return vec_index.backfill_batch!(limit: limit) end 0 end |
#checkpoint_wal ⇒ Object
Checkpoint the SQLite WAL file for compaction. Returns: true
254 255 256 257 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 254 def checkpoint_wal @store.checkpoint_wal true end |
#cleanup_vec_expired(limit: 100) ⇒ Object
Remove vector embeddings for superseded/expired facts. Returns: Integer count of cleaned embeddings (0 if unavailable)
158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 158 def cleanup_vec_expired(limit: 100) with_vec_index do |vec_index| stale_ids = @store.facts .where(status: %w[superseded expired]) .where(Sequel.~(vec_indexed_at: nil)) .select(:id) .limit(limit) .map { |r| r[:id] } stale_ids.each { |fact_id| vec_index.(fact_id) } return stale_ids.size end 0 end |
#dedupe_multi_value_facts ⇒ Object
Collapse duplicate multi-value facts. Before the resolver-level dedup fix (2026-04-17), multi-value predicates like uses_language and uses_framework accumulated identical rows every ingest cycle. For each (subject_entity_id, predicate, object_literal, scope) group with more than one active fact, keep the oldest row, copy the duplicates’ provenance onto the keeper (so we retain source signal), and mark the duplicates superseded. Returns the count of fact rows merged into their keeper.
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 58 def dedupe_multi_value_facts merged = 0 @store.db.transaction do # Pull every active fact with a literal object and group in Ruby. # Facts tables stay small (< 10k typical); Sequel's HAVING COUNT(*) # path hits adapter quoting bugs on some Extralite versions. active = @store.facts .where(status: "active") .exclude(subject_entity_id: nil) .exclude(object_literal: nil) .order(:id) .all groups = active.group_by { |f| [f[:subject_entity_id], f[:predicate], f[:object_literal]&.downcase, f[:scope]] } groups.each_value do |rows| next if rows.size < 2 keeper = rows.first rows[1..].each do |loser| @store.provenance.where(fact_id: loser[:id]).update(fact_id: keeper[:id]) @store.facts.where(id: loser[:id]).update( status: "superseded", valid_to: Time.now.utc.iso8601 ) @store.insert_fact_link(from_fact_id: keeper[:id], to_fact_id: loser[:id], link_type: "supersedes") merged += 1 end end end merged end |
#dedupe_open_conflicts(dry_run: false) ⇒ Hash
Deduplicate open conflicts that describe the same contradiction. Before the Resolver#apply_conflict dedupe fix (2026-04-24), each re-extraction of the losing value in a single-value slot produced a new disputed fact + conflict row — production DBs accumulated 11 open conflicts for “sqlite vs postgresql” referencing 11 different disputed facts. This pass keeps the earliest conflict per logical pair and marks the rest resolved, reinforcing the keeper’s provenance chain with the duplicates’ provenance.
Pair key: (subject_entity_id, predicate, normalized(object_a), normalized(object_b)) with object order sorted so A-vs-B == B-vs-A.
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 273 def dedupe_open_conflicts(dry_run: false) result = {inspected: 0, resolved: 0, decisions: []} open_rows = @store.conflicts .where(status: "open") .order(:id) .all return result if open_rows.empty? fact_ids = open_rows.flat_map { |r| [r[:fact_a_id], r[:fact_b_id]] }.uniq facts = @store.facts .where(id: fact_ids) .select(:id, :subject_entity_id, :predicate, :object_literal, :status) .all .to_h { |f| [f[:id], f] } @store.db.transaction do groups = open_rows.group_by { |row| pair_key(row, facts) }.reject { |key, _| key.nil? } groups.each_value do |rows_in_group| result[:inspected] += rows_in_group.size next if rows_in_group.size < 2 keeper = rows_in_group.first duplicates = rows_in_group[1..] duplicates.each do |dup| result[:decisions] << { conflict_id: dup[:id], action: :resolve_duplicate, keeper_id: keeper[:id], duplicate_fact_id: dup[:fact_b_id] } # Counted whether or not we actually write, so dry-run output # matches real-run output and callers can compare plans. result[:resolved] += 1 next if dry_run # Resolve the duplicate conflict. Also reject its disputed # side (fact_b_id is always the newer inserted-as-disputed # fact per Resolver convention), and shift its provenance # onto the keeper's fact_b so the evidence isn't lost. keeper_fact_b_id = keeper[:fact_b_id] if dup[:fact_b_id] != keeper_fact_b_id @store.provenance.where(fact_id: dup[:fact_b_id]).update(fact_id: keeper_fact_b_id) @store.facts.where(id: dup[:fact_b_id]).update( status: "rejected", valid_to: Time.now.utc.iso8601 ) end @store.conflicts.where(id: dup[:id]).update( status: "resolved", notes: "Deduplicated into conflict ##{keeper[:id]}" ) end end end result end |
#expire_disputed_facts ⇒ Object
Expire disputed facts older than TTL. Returns: Integer count of expired facts
42 43 44 45 46 47 48 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 42 def expire_disputed_facts cutoff = cutoff_time(@config[:disputed_fact_ttl_days]) @store.facts .where(status: "disputed") .where { created_at < cutoff } .update(status: "expired") end |
#expire_proposed_facts ⇒ Object
Expire proposed facts older than TTL. Returns: Integer count of expired facts
32 33 34 35 36 37 38 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 32 def expire_proposed_facts cutoff = cutoff_time(@config[:proposed_fact_ttl_days]) @store.facts .where(status: "proposed") .where { created_at < cutoff } .update(status: "expired") end |
#fix_scope_leakage ⇒ Object
Fix scope leakage: facts whose ‘scope` column disagrees with the store they live in. Pre-2026-04-20, the resolver treated scope_hint from the distiller as a scope override — so when the NullDistiller detected global-scope language (“always”, “my preference”), it stamped scope: “global” on facts that still ended up written to the project DB. The result was invisible orphaned rows: not in the global DB so global recall never saw them, but labeled global inside the project DB.
This pass detects those rows by comparing ‘scope` to the expected value derived from which DB this Maintenance instance is running against, and rewrites scope + project_path to match. Does not move facts between DBs — users can `claude-memory promote <id>` to do a proper cross-store copy. Returns: Integer count of facts whose scope was corrected.
108 109 110 111 112 113 114 115 116 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 108 def fix_scope_leakage expected = expected_scope_for_store return 0 unless expected project_path_for_scope = (expected == "global") ? nil : detect_project_path @store.facts .exclude(scope: expected) .update(scope: expected, project_path: project_path_for_scope) end |
#prune_old_content ⇒ Object
Delete old content items not referenced by any provenance. Also removes their FTS index entries. Returns: Integer count of deleted content items
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 130 def prune_old_content cutoff = cutoff_time(@config[:content_retention_days]) referenced_ids = @store.provenance.exclude(content_item_id: nil).select(:content_item_id) prunable = @store.content_items .where { ingested_at < cutoff } .exclude(id: referenced_ids) fts = ClaudeMemory::Index::LexicalFTS.new(@store) prunable.select(:id, :raw_text).each do |row| fts.remove_content_item(row[:id], row[:raw_text]) rescue # FTS entry may not exist; skip end prunable.delete end |
#prune_old_mcp_tool_calls ⇒ Object
Delete MCP tool-call telemetry rows older than retention window. Returns: Integer count of deleted rows (0 if table missing).
245 246 247 248 249 250 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 245 def prune_old_mcp_tool_calls return 0 unless @store.db.table_exists?(:mcp_tool_calls) cutoff = cutoff_time(@config[:mcp_tool_call_retention_days]) @store.mcp_tool_calls.where { called_at < cutoff }.delete end |
#prune_orphaned_provenance ⇒ Object
Delete provenance records referencing non-existent facts. Returns: Integer count of deleted provenance rows
120 121 122 123 124 125 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 120 def prune_orphaned_provenance fact_ids = @store.facts.select(:id) @store.provenance .exclude(fact_id: fact_ids) .delete end |
#reclassify_references(dry_run: false) ⇒ Hash
Reclassify active facts currently labeled ‘convention` whose object text matches the ReferenceMaterialDetector heuristics. Fixes the historical data tail from before the detector was wired into `store_extraction` on 2026-04-24. Current writes can’t create this pattern — this pass only cleans up what already exists.
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 340 def reclassify_references(dry_run: false) detector = ClaudeMemory::Distill::ReferenceMaterialDetector.new result = {inspected: 0, reclassified: 0, decisions: []} candidates = @store.facts .where(status: "active", predicate: "convention") .select(:id, :object_literal) .all @store.db.transaction do candidates.each do |row| result[:inspected] += 1 fact = {predicate: "convention", object: row[:object_literal]} next unless detector.reference_material?(fact) result[:decisions] << {fact_id: row[:id], object: row[:object_literal]} result[:reclassified] += 1 unless dry_run @store.facts.where(id: row[:id]).update(predicate: "reference") end end end result end |
#restore_multi_value_supersessions(predicate:, dry_run: false) ⇒ Hash
Restore superseded facts in a (subject, predicate) slot that were only superseded because of an obsolete single-value classification. Uses Jaccard-based token overlap to distinguish bug-superseded facts (token-disjoint siblings) from legitimate corrections (overlapping siblings).
Refuses to run on predicates still classified as single-value — they should stay superseded by design.
Never touches status: “rejected” facts (explicit user decisions).
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 185 def restore_multi_value_supersessions(predicate:, dry_run: false) if ClaudeMemory::Resolve::PredicatePolicy.single?(predicate) raise ArgumentError, "Predicate '#{predicate}' is still classified single-value; refusing to restore" end result = {inspected: 0, restored: 0, skipped_ambiguous: 0, skipped_rejected: 0, decisions: []} rows_by_subject = @store.facts .where(predicate: predicate) .exclude(status: "rejected") .select(:id, :subject_entity_id, :object_literal, :status) .all .group_by { |r| r[:subject_entity_id] } rejected_by_subject = @store.facts .where(predicate: predicate, status: "rejected") .select(:id, :subject_entity_id, :object_literal) .all .group_by { |r| r[:subject_entity_id] } @store.db.transaction do rows_by_subject.each do |subject_id, rows| rejected_rows = rejected_by_subject[subject_id] || [] siblings = rows + rejected_rows rows.each do |candidate| next unless candidate[:status] == "superseded" result[:inspected] += 1 candidate_tokens = restore_tokenize(candidate[:object_literal]) ambiguous_against = find_overlapping_siblings(candidate, siblings, candidate_tokens) if ambiguous_against.empty? result[:restored] += 1 result[:decisions] << { subject_entity_id: subject_id, fact_id: candidate[:id], object: candidate[:object_literal], action: :restore } restore_fact!(candidate[:id]) unless dry_run else result[:skipped_ambiguous] += 1 result[:decisions] << { subject_entity_id: subject_id, fact_id: candidate[:id], object: candidate[:object_literal], action: :skip_ambiguous, overlaps_with: ambiguous_against.map { |s| s[:object_literal] } } end end end end result end |
#vacuum ⇒ Object
Run SQLite VACUUM to reclaim space. Returns: true
369 370 371 372 |
# File 'lib/claude_memory/sweep/maintenance.rb', line 369 def vacuum @store.db.run("VACUUM") true end |