Class: ClaudeMemory::Sweep::Maintenance

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/sweep/maintenance.rb

Overview

Clean separation of individual maintenance operations from Sweeper’s budget-management orchestration. Each method performs a single operation and returns the count of affected records.

Source: QMD v2.0.1 Maintenance class pattern

Constant Summary collapse

RESTORE_STOPWORDS =

Short / noise tokens dropped before Jaccard comparison. Intentionally minimal — we want conservative token extraction that still treats “Rails 8.0” and “Rails 8.1” as overlapping.

%w[for the and with via of in on to by is are].to_set.freeze
RESTORE_JACCARD_THRESHOLD =
0.5
DEFAULT_CONFIG =
{
  proposed_fact_ttl_days: 14,
  disputed_fact_ttl_days: 30,
  content_retention_days: 30,
  mcp_tool_call_retention_days: 90,
  otel_metric_retention_days: 30,
  otel_event_retention_days: 14,
  otel_trace_retention_days: 7,
  observation_info_ttl_days: 30
}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(store, config: {}) ⇒ Maintenance

Returns a new instance of Maintenance.



29
30
31
32
# File 'lib/claude_memory/sweep/maintenance.rb', line 29

def initialize(store, config: {})
  @store = store
  @config = DEFAULT_CONFIG.merge(config)
end

Instance Attribute Details

#storeObject (readonly)

Returns the value of attribute store.



27
28
29
# File 'lib/claude_memory/sweep/maintenance.rb', line 27

def store
  @store
end

Instance Method Details

#backfill_vec_index(limit: 100) ⇒ Object

Backfill vector index for unindexed facts. Returns: Integer count of backfilled embeddings (0 if unavailable)



153
154
155
156
157
158
# File 'lib/claude_memory/sweep/maintenance.rb', line 153

def backfill_vec_index(limit: 100)
  with_vec_index do |vec_index|
    return vec_index.backfill_batch!(limit: limit)
  end
  0
end

#checkpoint_walObject

Checkpoint the SQLite WAL file for compaction. Returns: true



285
286
287
288
# File 'lib/claude_memory/sweep/maintenance.rb', line 285

def checkpoint_wal
  @store.checkpoint_wal
  true
end

#cleanup_vec_expired(limit: 100) ⇒ Object

Remove vector embeddings for superseded/expired facts. Returns: Integer count of cleaned embeddings (0 if unavailable)



162
163
164
165
166
167
168
169
170
171
172
173
174
175
# File 'lib/claude_memory/sweep/maintenance.rb', line 162

def cleanup_vec_expired(limit: 100)
  with_vec_index do |vec_index|
    stale_ids = @store.facts
      .where(status: %w[superseded expired])
      .where(Sequel.~(vec_indexed_at: nil))
      .select(:id)
      .limit(limit)
      .map { |r| r[:id] }

    stale_ids.each { |fact_id| vec_index.remove_embedding(fact_id) }
    return stale_ids.size
  end
  0
end

#dedupe_multi_value_factsObject

Collapse duplicate multi-value facts. Before the resolver-level dedup fix (2026-04-17), multi-value predicates like uses_language and uses_framework accumulated identical rows every ingest cycle. For each (subject_entity_id, predicate, object_literal, scope) group with more than one active fact, keep the oldest row, copy the duplicates’ provenance onto the keeper (so we retain source signal), and mark the duplicates superseded. Returns the count of fact rows merged into their keeper.



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/claude_memory/sweep/maintenance.rb', line 62

def dedupe_multi_value_facts
  merged = 0
  @store.db.transaction do
    # Pull every active fact with a literal object and group in Ruby.
    # Facts tables stay small (< 10k typical); Sequel's HAVING COUNT(*)
    # path hits adapter quoting bugs on some Extralite versions.
    active = @store.facts
      .where(status: "active")
      .exclude(subject_entity_id: nil)
      .exclude(object_literal: nil)
      .order(:id)
      .all

    groups = active.group_by { |f|
      [f[:subject_entity_id], f[:predicate], f[:object_literal]&.downcase, f[:scope]]
    }

    groups.each_value do |rows|
      next if rows.size < 2

      keeper = rows.first
      rows[1..].each do |loser|
        @store.provenance.where(fact_id: loser[:id]).update(fact_id: keeper[:id])
        @store.facts.where(id: loser[:id]).update(
          status: "superseded",
          valid_to: Time.now.utc.iso8601
        )
        @store.insert_fact_link(from_fact_id: keeper[:id], to_fact_id: loser[:id], link_type: "supersedes")
        merged += 1
      end
    end
  end
  merged
end

#dedupe_open_conflicts(dry_run: false) ⇒ Hash

Deduplicate open conflicts that describe the same contradiction. Before the Resolver#apply_conflict dedupe fix (2026-04-24), each re-extraction of the losing value in a single-value slot produced a new disputed fact + conflict row — production DBs accumulated 11 open conflicts for “sqlite vs postgresql” referencing 11 different disputed facts. This pass keeps the earliest conflict per logical pair and marks the rest resolved, reinforcing the keeper’s provenance chain with the duplicates’ provenance.

Pair key: (subject_entity_id, predicate, normalized(object_a), normalized(object_b)) with object order sorted so A-vs-B == B-vs-A.

Parameters:

  • dry_run (Boolean) (defaults to: false)

    when true, decide but don’t write

Returns:

  • (Hash)

    resolved:, decisions: [{conflict_id:, action:, keeper_id:]}



304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
# File 'lib/claude_memory/sweep/maintenance.rb', line 304

def dedupe_open_conflicts(dry_run: false)
  result = {inspected: 0, resolved: 0, decisions: []}

  open_rows = @store.conflicts
    .where(status: "open")
    .order(:id)
    .all
  return result if open_rows.empty?

  fact_ids = open_rows.flat_map { |r| [r[:fact_a_id], r[:fact_b_id]] }.uniq
  facts = @store.facts
    .where(id: fact_ids)
    .select(:id, :subject_entity_id, :predicate, :object_literal, :status)
    .all
    .to_h { |f| [f[:id], f] }

  @store.db.transaction do
    groups = open_rows.group_by { |row| pair_key(row, facts) }.reject { |key, _| key.nil? }
    groups.each_value do |rows_in_group|
      result[:inspected] += rows_in_group.size
      next if rows_in_group.size < 2

      keeper = rows_in_group.first
      duplicates = rows_in_group[1..]
      duplicates.each do |dup|
        result[:decisions] << {
          conflict_id: dup[:id],
          action: :resolve_duplicate,
          keeper_id: keeper[:id],
          duplicate_fact_id: dup[:fact_b_id]
        }
        # Counted whether or not we actually write, so dry-run output
        # matches real-run output and callers can compare plans.
        result[:resolved] += 1
        next if dry_run

        # Resolve the duplicate conflict. Also reject its disputed
        # side (fact_b_id is always the newer inserted-as-disputed
        # fact per Resolver convention), and shift its provenance
        # onto the keeper's fact_b so the evidence isn't lost.
        keeper_fact_b_id = keeper[:fact_b_id]
        if dup[:fact_b_id] != keeper_fact_b_id
          @store.provenance.where(fact_id: dup[:fact_b_id]).update(fact_id: keeper_fact_b_id)
          @store.facts.where(id: dup[:fact_b_id]).update(
            status: "rejected",
            valid_to: Time.now.utc.iso8601
          )
        end
        @store.conflicts.where(id: dup[:id]).update(
          status: "resolved",
          notes: "Deduplicated into conflict ##{keeper[:id]}"
        )
      end
    end
  end

  result
end

#expire_disputed_factsObject

Expire disputed facts older than TTL. Returns: Integer count of expired facts



46
47
48
49
50
51
52
# File 'lib/claude_memory/sweep/maintenance.rb', line 46

def expire_disputed_facts
  cutoff = cutoff_time(@config[:disputed_fact_ttl_days])
  @store.facts
    .where(status: "disputed")
    .where { created_at < cutoff }
    .update(status: "expired")
end

#expire_proposed_factsObject

Expire proposed facts older than TTL. Returns: Integer count of expired facts



36
37
38
39
40
41
42
# File 'lib/claude_memory/sweep/maintenance.rb', line 36

def expire_proposed_facts
  cutoff = cutoff_time(@config[:proposed_fact_ttl_days])
  @store.facts
    .where(status: "proposed")
    .where { created_at < cutoff }
    .update(status: "expired")
end

#fix_scope_leakageObject

Fix scope leakage: facts whose ‘scope` column disagrees with the store they live in. Pre-2026-04-20, the resolver treated scope_hint from the distiller as a scope override — so when the NullDistiller detected global-scope language (“always”, “my preference”), it stamped scope: “global” on facts that still ended up written to the project DB. The result was invisible orphaned rows: not in the global DB so global recall never saw them, but labeled global inside the project DB.

This pass detects those rows by comparing ‘scope` to the expected value derived from which DB this Maintenance instance is running against, and rewrites scope + project_path to match. Does not move facts between DBs — users can `claude-memory promote <id>` to do a proper cross-store copy. Returns: Integer count of facts whose scope was corrected.



112
113
114
115
116
117
118
119
120
# File 'lib/claude_memory/sweep/maintenance.rb', line 112

def fix_scope_leakage
  expected = expected_scope_for_store
  return 0 unless expected

  project_path_for_scope = (expected == "global") ? nil : detect_project_path
  @store.facts
    .exclude(scope: expected)
    .update(scope: expected, project_path: project_path_for_scope)
end

#prune_old_contentObject

Delete old content items not referenced by any provenance. Also removes their FTS index entries. Returns: Integer count of deleted content items



134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/claude_memory/sweep/maintenance.rb', line 134

def prune_old_content
  cutoff = cutoff_time(@config[:content_retention_days])
  referenced_ids = @store.provenance.exclude(content_item_id: nil).select(:content_item_id)
  prunable = @store.content_items
    .where { ingested_at < cutoff }
    .exclude(id: referenced_ids)

  fts = ClaudeMemory::Index::LexicalFTS.new(@store)
  prunable.select(:id, :raw_text).each do |row|
    fts.remove_content_item(row[:id], row[:raw_text])
  rescue
    # FTS entry may not exist; skip
  end

  prunable.delete
end

#prune_old_mcp_tool_callsObject

Delete MCP tool-call telemetry rows older than retention window. Returns: Integer count of deleted rows (0 if table missing).



249
250
251
252
253
254
# File 'lib/claude_memory/sweep/maintenance.rb', line 249

def prune_old_mcp_tool_calls
  return 0 unless @store.db.table_exists?(:mcp_tool_calls)

  cutoff = cutoff_time(@config[:mcp_tool_call_retention_days])
  @store.mcp_tool_calls.where { called_at < cutoff }.delete
end

#prune_old_otel_eventsObject

Delete OTel log-style events older than retention window. Returns: Integer count of deleted rows (0 if table missing).



267
268
269
270
271
272
# File 'lib/claude_memory/sweep/maintenance.rb', line 267

def prune_old_otel_events
  return 0 unless @store.db.table_exists?(:otel_events)

  cutoff = cutoff_time(@config[:otel_event_retention_days])
  @store.otel_events.where { occurred_at < cutoff }.delete
end

#prune_old_otel_metricsObject

Delete OTel metric data points older than retention window. Returns: Integer count of deleted rows (0 if table missing).



258
259
260
261
262
263
# File 'lib/claude_memory/sweep/maintenance.rb', line 258

def prune_old_otel_metrics
  return 0 unless @store.db.table_exists?(:otel_metrics)

  cutoff = cutoff_time(@config[:otel_metric_retention_days])
  @store.otel_metrics.where { recorded_at < cutoff }.delete
end

#prune_old_otel_tracesObject

Delete OTel trace spans older than retention window. Returns: Integer count of deleted rows (0 if table missing).



276
277
278
279
280
281
# File 'lib/claude_memory/sweep/maintenance.rb', line 276

def prune_old_otel_traces
  return 0 unless @store.db.table_exists?(:otel_traces)

  cutoff = cutoff_time(@config[:otel_trace_retention_days])
  @store.otel_traces.where { recorded_at < cutoff }.delete
end

#prune_orphaned_provenanceObject

Delete provenance records referencing non-existent facts. Returns: Integer count of deleted provenance rows



124
125
126
127
128
129
# File 'lib/claude_memory/sweep/maintenance.rb', line 124

def prune_orphaned_provenance
  fact_ids = @store.facts.select(:id)
  @store.provenance
    .exclude(fact_id: fact_ids)
    .delete
end

#reclassify_references(dry_run: false) ⇒ Hash

Reclassify active facts currently labeled ‘convention` whose object text matches the ReferenceMaterialDetector heuristics. Fixes the historical data tail from before the detector was wired into `store_extraction` on 2026-04-24. Current writes can’t create this pattern — this pass only cleans up what already exists.

Parameters:

  • dry_run (Boolean) (defaults to: false)

    when true, decide but don’t write

Returns:

  • (Hash)

    reclassified:, decisions: [{fact_id:, object:]}



371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
# File 'lib/claude_memory/sweep/maintenance.rb', line 371

def reclassify_references(dry_run: false)
  detector = ClaudeMemory::Distill::ReferenceMaterialDetector.new
  result = {inspected: 0, reclassified: 0, decisions: []}

  candidates = @store.facts
    .where(status: "active", predicate: "convention")
    .select(:id, :object_literal)
    .all

  @store.db.transaction do
    candidates.each do |row|
      result[:inspected] += 1
      fact = {predicate: "convention", object: row[:object_literal]}
      next unless detector.reference_material?(fact)

      result[:decisions] << {fact_id: row[:id], object: row[:object_literal]}
      result[:reclassified] += 1

      unless dry_run
        @store.facts.where(id: row[:id]).update(predicate: "reference")
      end
    end
  end

  result
end

#reflect_observationsObject

Run the deterministic observation Reflector (dedupe near-identical observations + expire stale info-level ones). Free, no LLM —provenance-preserving (tombstone, never delete). Returns: Hash expired:



402
403
404
405
406
407
408
409
# File 'lib/claude_memory/sweep/maintenance.rb', line 402

def reflect_observations
  return {deduped: 0, expired: 0} unless @store.db.table_exists?(:observations)

  result = ClaudeMemory::Observe::Reflector.new(
    @store, info_ttl_days: @config[:observation_info_ttl_days]
  ).reflect!
  {deduped: result.deduped, expired: result.expired}
end

#restore_multi_value_supersessions(predicate:, dry_run: false) ⇒ Hash

Restore superseded facts in a (subject, predicate) slot that were only superseded because of an obsolete single-value classification. Uses Jaccard-based token overlap to distinguish bug-superseded facts (token-disjoint siblings) from legitimate corrections (overlapping siblings).

Refuses to run on predicates still classified as single-value — they should stay superseded by design.

Never touches status: “rejected” facts (explicit user decisions).

Returns:

  • (Hash)

    restored, skipped_ambiguous, skipped_rejected, decisions



189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
# File 'lib/claude_memory/sweep/maintenance.rb', line 189

def restore_multi_value_supersessions(predicate:, dry_run: false)
  if ClaudeMemory::Resolve::PredicatePolicy.single?(predicate)
    raise ArgumentError, "Predicate '#{predicate}' is still classified single-value; refusing to restore"
  end

  result = {inspected: 0, restored: 0, skipped_ambiguous: 0, skipped_rejected: 0, decisions: []}

  rows_by_subject = @store.facts
    .where(predicate: predicate)
    .exclude(status: "rejected")
    .select(:id, :subject_entity_id, :object_literal, :status)
    .all
    .group_by { |r| r[:subject_entity_id] }

  rejected_by_subject = @store.facts
    .where(predicate: predicate, status: "rejected")
    .select(:id, :subject_entity_id, :object_literal)
    .all
    .group_by { |r| r[:subject_entity_id] }

  @store.db.transaction do
    rows_by_subject.each do |subject_id, rows|
      rejected_rows = rejected_by_subject[subject_id] || []
      siblings = rows + rejected_rows

      rows.each do |candidate|
        next unless candidate[:status] == "superseded"
        result[:inspected] += 1

        candidate_tokens = restore_tokenize(candidate[:object_literal])
        ambiguous_against = find_overlapping_siblings(candidate, siblings, candidate_tokens)

        if ambiguous_against.empty?
          result[:restored] += 1
          result[:decisions] << {
            subject_entity_id: subject_id,
            fact_id: candidate[:id],
            object: candidate[:object_literal],
            action: :restore
          }
          restore_fact!(candidate[:id]) unless dry_run
        else
          result[:skipped_ambiguous] += 1
          result[:decisions] << {
            subject_entity_id: subject_id,
            fact_id: candidate[:id],
            object: candidate[:object_literal],
            action: :skip_ambiguous,
            overlaps_with: ambiguous_against.map { |s| s[:object_literal] }
          }
        end
      end
    end
  end

  result
end

#vacuumObject

Run SQLite VACUUM to reclaim space. Returns: true



413
414
415
416
# File 'lib/claude_memory/sweep/maintenance.rb', line 413

def vacuum
  @store.db.run("VACUUM")
  true
end