Class: ClaudeMemory::Sweep::Maintenance

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/sweep/maintenance.rb

Overview

Clean separation of individual maintenance operations from Sweeper’s budget-management orchestration. Each method performs a single operation and returns the count of affected records.

Source: QMD v2.0.1 Maintenance class pattern

Constant Summary collapse

RESTORE_STOPWORDS =

Short / noise tokens dropped before Jaccard comparison. Intentionally minimal — we want conservative token extraction that still treats “Rails 8.0” and “Rails 8.1” as overlapping.

%w[for the and with via of in on to by is are].to_set.freeze
RESTORE_JACCARD_THRESHOLD =
0.5
DEFAULT_CONFIG =
{
  proposed_fact_ttl_days: 14,
  disputed_fact_ttl_days: 30,
  content_retention_days: 30,
  mcp_tool_call_retention_days: 90,
  otel_metric_retention_days: 30,
  otel_event_retention_days: 14,
  otel_trace_retention_days: 7
}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(store, config: {}) ⇒ Maintenance

Returns a new instance of Maintenance.



28
29
30
31
# File 'lib/claude_memory/sweep/maintenance.rb', line 28

def initialize(store, config: {})
  @store = store
  @config = DEFAULT_CONFIG.merge(config)
end

Instance Attribute Details

#storeObject (readonly)

Returns the value of attribute store.



26
27
28
# File 'lib/claude_memory/sweep/maintenance.rb', line 26

def store
  @store
end

Instance Method Details

#backfill_vec_index(limit: 100) ⇒ Object

Backfill vector index for unindexed facts. Returns: Integer count of backfilled embeddings (0 if unavailable)



152
153
154
155
156
157
# File 'lib/claude_memory/sweep/maintenance.rb', line 152

def backfill_vec_index(limit: 100)
  with_vec_index do |vec_index|
    return vec_index.backfill_batch!(limit: limit)
  end
  0
end

#checkpoint_walObject

Checkpoint the SQLite WAL file for compaction. Returns: true



284
285
286
287
# File 'lib/claude_memory/sweep/maintenance.rb', line 284

def checkpoint_wal
  @store.checkpoint_wal
  true
end

#cleanup_vec_expired(limit: 100) ⇒ Object

Remove vector embeddings for superseded/expired facts. Returns: Integer count of cleaned embeddings (0 if unavailable)



161
162
163
164
165
166
167
168
169
170
171
172
173
174
# File 'lib/claude_memory/sweep/maintenance.rb', line 161

def cleanup_vec_expired(limit: 100)
  with_vec_index do |vec_index|
    stale_ids = @store.facts
      .where(status: %w[superseded expired])
      .where(Sequel.~(vec_indexed_at: nil))
      .select(:id)
      .limit(limit)
      .map { |r| r[:id] }

    stale_ids.each { |fact_id| vec_index.remove_embedding(fact_id) }
    return stale_ids.size
  end
  0
end

#dedupe_multi_value_factsObject

Collapse duplicate multi-value facts. Before the resolver-level dedup fix (2026-04-17), multi-value predicates like uses_language and uses_framework accumulated identical rows every ingest cycle. For each (subject_entity_id, predicate, object_literal, scope) group with more than one active fact, keep the oldest row, copy the duplicates’ provenance onto the keeper (so we retain source signal), and mark the duplicates superseded. Returns the count of fact rows merged into their keeper.



61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/claude_memory/sweep/maintenance.rb', line 61

def dedupe_multi_value_facts
  merged = 0
  @store.db.transaction do
    # Pull every active fact with a literal object and group in Ruby.
    # Facts tables stay small (< 10k typical); Sequel's HAVING COUNT(*)
    # path hits adapter quoting bugs on some Extralite versions.
    active = @store.facts
      .where(status: "active")
      .exclude(subject_entity_id: nil)
      .exclude(object_literal: nil)
      .order(:id)
      .all

    groups = active.group_by { |f|
      [f[:subject_entity_id], f[:predicate], f[:object_literal]&.downcase, f[:scope]]
    }

    groups.each_value do |rows|
      next if rows.size < 2

      keeper = rows.first
      rows[1..].each do |loser|
        @store.provenance.where(fact_id: loser[:id]).update(fact_id: keeper[:id])
        @store.facts.where(id: loser[:id]).update(
          status: "superseded",
          valid_to: Time.now.utc.iso8601
        )
        @store.insert_fact_link(from_fact_id: keeper[:id], to_fact_id: loser[:id], link_type: "supersedes")
        merged += 1
      end
    end
  end
  merged
end

#dedupe_open_conflicts(dry_run: false) ⇒ Hash

Deduplicate open conflicts that describe the same contradiction. Before the Resolver#apply_conflict dedupe fix (2026-04-24), each re-extraction of the losing value in a single-value slot produced a new disputed fact + conflict row — production DBs accumulated 11 open conflicts for “sqlite vs postgresql” referencing 11 different disputed facts. This pass keeps the earliest conflict per logical pair and marks the rest resolved, reinforcing the keeper’s provenance chain with the duplicates’ provenance.

Pair key: (subject_entity_id, predicate, normalized(object_a), normalized(object_b)) with object order sorted so A-vs-B == B-vs-A.

Parameters:

  • dry_run (Boolean) (defaults to: false)

    when true, decide but don’t write

Returns:

  • (Hash)

    resolved:, decisions: [{conflict_id:, action:, keeper_id:]}



303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
# File 'lib/claude_memory/sweep/maintenance.rb', line 303

def dedupe_open_conflicts(dry_run: false)
  result = {inspected: 0, resolved: 0, decisions: []}

  open_rows = @store.conflicts
    .where(status: "open")
    .order(:id)
    .all
  return result if open_rows.empty?

  fact_ids = open_rows.flat_map { |r| [r[:fact_a_id], r[:fact_b_id]] }.uniq
  facts = @store.facts
    .where(id: fact_ids)
    .select(:id, :subject_entity_id, :predicate, :object_literal, :status)
    .all
    .to_h { |f| [f[:id], f] }

  @store.db.transaction do
    groups = open_rows.group_by { |row| pair_key(row, facts) }.reject { |key, _| key.nil? }
    groups.each_value do |rows_in_group|
      result[:inspected] += rows_in_group.size
      next if rows_in_group.size < 2

      keeper = rows_in_group.first
      duplicates = rows_in_group[1..]
      duplicates.each do |dup|
        result[:decisions] << {
          conflict_id: dup[:id],
          action: :resolve_duplicate,
          keeper_id: keeper[:id],
          duplicate_fact_id: dup[:fact_b_id]
        }
        # Counted whether or not we actually write, so dry-run output
        # matches real-run output and callers can compare plans.
        result[:resolved] += 1
        next if dry_run

        # Resolve the duplicate conflict. Also reject its disputed
        # side (fact_b_id is always the newer inserted-as-disputed
        # fact per Resolver convention), and shift its provenance
        # onto the keeper's fact_b so the evidence isn't lost.
        keeper_fact_b_id = keeper[:fact_b_id]
        if dup[:fact_b_id] != keeper_fact_b_id
          @store.provenance.where(fact_id: dup[:fact_b_id]).update(fact_id: keeper_fact_b_id)
          @store.facts.where(id: dup[:fact_b_id]).update(
            status: "rejected",
            valid_to: Time.now.utc.iso8601
          )
        end
        @store.conflicts.where(id: dup[:id]).update(
          status: "resolved",
          notes: "Deduplicated into conflict ##{keeper[:id]}"
        )
      end
    end
  end

  result
end

#expire_disputed_factsObject

Expire disputed facts older than TTL. Returns: Integer count of expired facts



45
46
47
48
49
50
51
# File 'lib/claude_memory/sweep/maintenance.rb', line 45

def expire_disputed_facts
  cutoff = cutoff_time(@config[:disputed_fact_ttl_days])
  @store.facts
    .where(status: "disputed")
    .where { created_at < cutoff }
    .update(status: "expired")
end

#expire_proposed_factsObject

Expire proposed facts older than TTL. Returns: Integer count of expired facts



35
36
37
38
39
40
41
# File 'lib/claude_memory/sweep/maintenance.rb', line 35

def expire_proposed_facts
  cutoff = cutoff_time(@config[:proposed_fact_ttl_days])
  @store.facts
    .where(status: "proposed")
    .where { created_at < cutoff }
    .update(status: "expired")
end

#fix_scope_leakageObject

Fix scope leakage: facts whose ‘scope` column disagrees with the store they live in. Pre-2026-04-20, the resolver treated scope_hint from the distiller as a scope override — so when the NullDistiller detected global-scope language (“always”, “my preference”), it stamped scope: “global” on facts that still ended up written to the project DB. The result was invisible orphaned rows: not in the global DB so global recall never saw them, but labeled global inside the project DB.

This pass detects those rows by comparing ‘scope` to the expected value derived from which DB this Maintenance instance is running against, and rewrites scope + project_path to match. Does not move facts between DBs — users can `claude-memory promote <id>` to do a proper cross-store copy. Returns: Integer count of facts whose scope was corrected.



111
112
113
114
115
116
117
118
119
# File 'lib/claude_memory/sweep/maintenance.rb', line 111

def fix_scope_leakage
  expected = expected_scope_for_store
  return 0 unless expected

  project_path_for_scope = (expected == "global") ? nil : detect_project_path
  @store.facts
    .exclude(scope: expected)
    .update(scope: expected, project_path: project_path_for_scope)
end

#prune_old_contentObject

Delete old content items not referenced by any provenance. Also removes their FTS index entries. Returns: Integer count of deleted content items



133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# File 'lib/claude_memory/sweep/maintenance.rb', line 133

def prune_old_content
  cutoff = cutoff_time(@config[:content_retention_days])
  referenced_ids = @store.provenance.exclude(content_item_id: nil).select(:content_item_id)
  prunable = @store.content_items
    .where { ingested_at < cutoff }
    .exclude(id: referenced_ids)

  fts = ClaudeMemory::Index::LexicalFTS.new(@store)
  prunable.select(:id, :raw_text).each do |row|
    fts.remove_content_item(row[:id], row[:raw_text])
  rescue
    # FTS entry may not exist; skip
  end

  prunable.delete
end

#prune_old_mcp_tool_callsObject

Delete MCP tool-call telemetry rows older than retention window. Returns: Integer count of deleted rows (0 if table missing).



248
249
250
251
252
253
# File 'lib/claude_memory/sweep/maintenance.rb', line 248

def prune_old_mcp_tool_calls
  return 0 unless @store.db.table_exists?(:mcp_tool_calls)

  cutoff = cutoff_time(@config[:mcp_tool_call_retention_days])
  @store.mcp_tool_calls.where { called_at < cutoff }.delete
end

#prune_old_otel_eventsObject

Delete OTel log-style events older than retention window. Returns: Integer count of deleted rows (0 if table missing).



266
267
268
269
270
271
# File 'lib/claude_memory/sweep/maintenance.rb', line 266

def prune_old_otel_events
  return 0 unless @store.db.table_exists?(:otel_events)

  cutoff = cutoff_time(@config[:otel_event_retention_days])
  @store.otel_events.where { occurred_at < cutoff }.delete
end

#prune_old_otel_metricsObject

Delete OTel metric data points older than retention window. Returns: Integer count of deleted rows (0 if table missing).



257
258
259
260
261
262
# File 'lib/claude_memory/sweep/maintenance.rb', line 257

def prune_old_otel_metrics
  return 0 unless @store.db.table_exists?(:otel_metrics)

  cutoff = cutoff_time(@config[:otel_metric_retention_days])
  @store.otel_metrics.where { recorded_at < cutoff }.delete
end

#prune_old_otel_tracesObject

Delete OTel trace spans older than retention window. Returns: Integer count of deleted rows (0 if table missing).



275
276
277
278
279
280
# File 'lib/claude_memory/sweep/maintenance.rb', line 275

def prune_old_otel_traces
  return 0 unless @store.db.table_exists?(:otel_traces)

  cutoff = cutoff_time(@config[:otel_trace_retention_days])
  @store.otel_traces.where { recorded_at < cutoff }.delete
end

#prune_orphaned_provenanceObject

Delete provenance records referencing non-existent facts. Returns: Integer count of deleted provenance rows



123
124
125
126
127
128
# File 'lib/claude_memory/sweep/maintenance.rb', line 123

def prune_orphaned_provenance
  fact_ids = @store.facts.select(:id)
  @store.provenance
    .exclude(fact_id: fact_ids)
    .delete
end

#reclassify_references(dry_run: false) ⇒ Hash

Reclassify active facts currently labeled ‘convention` whose object text matches the ReferenceMaterialDetector heuristics. Fixes the historical data tail from before the detector was wired into `store_extraction` on 2026-04-24. Current writes can’t create this pattern — this pass only cleans up what already exists.

Parameters:

  • dry_run (Boolean) (defaults to: false)

    when true, decide but don’t write

Returns:

  • (Hash)

    reclassified:, decisions: [{fact_id:, object:]}



370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
# File 'lib/claude_memory/sweep/maintenance.rb', line 370

def reclassify_references(dry_run: false)
  detector = ClaudeMemory::Distill::ReferenceMaterialDetector.new
  result = {inspected: 0, reclassified: 0, decisions: []}

  candidates = @store.facts
    .where(status: "active", predicate: "convention")
    .select(:id, :object_literal)
    .all

  @store.db.transaction do
    candidates.each do |row|
      result[:inspected] += 1
      fact = {predicate: "convention", object: row[:object_literal]}
      next unless detector.reference_material?(fact)

      result[:decisions] << {fact_id: row[:id], object: row[:object_literal]}
      result[:reclassified] += 1

      unless dry_run
        @store.facts.where(id: row[:id]).update(predicate: "reference")
      end
    end
  end

  result
end

#restore_multi_value_supersessions(predicate:, dry_run: false) ⇒ Hash

Restore superseded facts in a (subject, predicate) slot that were only superseded because of an obsolete single-value classification. Uses Jaccard-based token overlap to distinguish bug-superseded facts (token-disjoint siblings) from legitimate corrections (overlapping siblings).

Refuses to run on predicates still classified as single-value — they should stay superseded by design.

Never touches status: “rejected” facts (explicit user decisions).

Returns:

  • (Hash)

    restored, skipped_ambiguous, skipped_rejected, decisions



188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
# File 'lib/claude_memory/sweep/maintenance.rb', line 188

def restore_multi_value_supersessions(predicate:, dry_run: false)
  if ClaudeMemory::Resolve::PredicatePolicy.single?(predicate)
    raise ArgumentError, "Predicate '#{predicate}' is still classified single-value; refusing to restore"
  end

  result = {inspected: 0, restored: 0, skipped_ambiguous: 0, skipped_rejected: 0, decisions: []}

  rows_by_subject = @store.facts
    .where(predicate: predicate)
    .exclude(status: "rejected")
    .select(:id, :subject_entity_id, :object_literal, :status)
    .all
    .group_by { |r| r[:subject_entity_id] }

  rejected_by_subject = @store.facts
    .where(predicate: predicate, status: "rejected")
    .select(:id, :subject_entity_id, :object_literal)
    .all
    .group_by { |r| r[:subject_entity_id] }

  @store.db.transaction do
    rows_by_subject.each do |subject_id, rows|
      rejected_rows = rejected_by_subject[subject_id] || []
      siblings = rows + rejected_rows

      rows.each do |candidate|
        next unless candidate[:status] == "superseded"
        result[:inspected] += 1

        candidate_tokens = restore_tokenize(candidate[:object_literal])
        ambiguous_against = find_overlapping_siblings(candidate, siblings, candidate_tokens)

        if ambiguous_against.empty?
          result[:restored] += 1
          result[:decisions] << {
            subject_entity_id: subject_id,
            fact_id: candidate[:id],
            object: candidate[:object_literal],
            action: :restore
          }
          restore_fact!(candidate[:id]) unless dry_run
        else
          result[:skipped_ambiguous] += 1
          result[:decisions] << {
            subject_entity_id: subject_id,
            fact_id: candidate[:id],
            object: candidate[:object_literal],
            action: :skip_ambiguous,
            overlaps_with: ambiguous_against.map { |s| s[:object_literal] }
          }
        end
      end
    end
  end

  result
end

#vacuumObject

Run SQLite VACUUM to reclaim space. Returns: true



399
400
401
402
# File 'lib/claude_memory/sweep/maintenance.rb', line 399

def vacuum
  @store.db.run("VACUUM")
  true
end