Class: ClaudeMemory::Sweep::Maintenance

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/sweep/maintenance.rb

Overview

Clean separation of individual maintenance operations from Sweeper’s budget-management orchestration. Each method performs a single operation and returns the count of affected records.

Source: QMD v2.0.1 Maintenance class pattern

Constant Summary collapse

RESTORE_STOPWORDS =

Short / noise tokens dropped before Jaccard comparison. Intentionally minimal — we want conservative token extraction that still treats “Rails 8.0” and “Rails 8.1” as overlapping.

%w[for the and with via of in on to by is are].to_set.freeze
RESTORE_JACCARD_THRESHOLD =
0.5
DEFAULT_CONFIG =
{
  proposed_fact_ttl_days: 14,
  disputed_fact_ttl_days: 30,
  content_retention_days: 30,
  mcp_tool_call_retention_days: 90
}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(store, config: {}) ⇒ Maintenance

Returns a new instance of Maintenance.



25
26
27
28
# File 'lib/claude_memory/sweep/maintenance.rb', line 25

def initialize(store, config: {})
  @store = store
  @config = DEFAULT_CONFIG.merge(config)
end

Instance Attribute Details

#storeObject (readonly)

Returns the value of attribute store.



23
24
25
# File 'lib/claude_memory/sweep/maintenance.rb', line 23

def store
  @store
end

Instance Method Details

#backfill_vec_index(limit: 100) ⇒ Object

Backfill vector index for unindexed facts. Returns: Integer count of backfilled embeddings (0 if unavailable)



149
150
151
152
153
154
# File 'lib/claude_memory/sweep/maintenance.rb', line 149

def backfill_vec_index(limit: 100)
  with_vec_index do |vec_index|
    return vec_index.backfill_batch!(limit: limit)
  end
  0
end

#checkpoint_walObject

Checkpoint the SQLite WAL file for compaction. Returns: true



254
255
256
257
# File 'lib/claude_memory/sweep/maintenance.rb', line 254

def checkpoint_wal
  @store.checkpoint_wal
  true
end

#cleanup_vec_expired(limit: 100) ⇒ Object

Remove vector embeddings for superseded/expired facts. Returns: Integer count of cleaned embeddings (0 if unavailable)



158
159
160
161
162
163
164
165
166
167
168
169
170
171
# File 'lib/claude_memory/sweep/maintenance.rb', line 158

def cleanup_vec_expired(limit: 100)
  with_vec_index do |vec_index|
    stale_ids = @store.facts
      .where(status: %w[superseded expired])
      .where(Sequel.~(vec_indexed_at: nil))
      .select(:id)
      .limit(limit)
      .map { |r| r[:id] }

    stale_ids.each { |fact_id| vec_index.remove_embedding(fact_id) }
    return stale_ids.size
  end
  0
end

#dedupe_multi_value_factsObject

Collapse duplicate multi-value facts. Before the resolver-level dedup fix (2026-04-17), multi-value predicates like uses_language and uses_framework accumulated identical rows every ingest cycle. For each (subject_entity_id, predicate, object_literal, scope) group with more than one active fact, keep the oldest row, copy the duplicates’ provenance onto the keeper (so we retain source signal), and mark the duplicates superseded. Returns the count of fact rows merged into their keeper.



58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# File 'lib/claude_memory/sweep/maintenance.rb', line 58

def dedupe_multi_value_facts
  merged = 0
  @store.db.transaction do
    # Pull every active fact with a literal object and group in Ruby.
    # Facts tables stay small (< 10k typical); Sequel's HAVING COUNT(*)
    # path hits adapter quoting bugs on some Extralite versions.
    active = @store.facts
      .where(status: "active")
      .exclude(subject_entity_id: nil)
      .exclude(object_literal: nil)
      .order(:id)
      .all

    groups = active.group_by { |f|
      [f[:subject_entity_id], f[:predicate], f[:object_literal]&.downcase, f[:scope]]
    }

    groups.each_value do |rows|
      next if rows.size < 2

      keeper = rows.first
      rows[1..].each do |loser|
        @store.provenance.where(fact_id: loser[:id]).update(fact_id: keeper[:id])
        @store.facts.where(id: loser[:id]).update(
          status: "superseded",
          valid_to: Time.now.utc.iso8601
        )
        @store.insert_fact_link(from_fact_id: keeper[:id], to_fact_id: loser[:id], link_type: "supersedes")
        merged += 1
      end
    end
  end
  merged
end

#dedupe_open_conflicts(dry_run: false) ⇒ Hash

Deduplicate open conflicts that describe the same contradiction. Before the Resolver#apply_conflict dedupe fix (2026-04-24), each re-extraction of the losing value in a single-value slot produced a new disputed fact + conflict row — production DBs accumulated 11 open conflicts for “sqlite vs postgresql” referencing 11 different disputed facts. This pass keeps the earliest conflict per logical pair and marks the rest resolved, reinforcing the keeper’s provenance chain with the duplicates’ provenance.

Pair key: (subject_entity_id, predicate, normalized(object_a), normalized(object_b)) with object order sorted so A-vs-B == B-vs-A.

Parameters:

  • dry_run (Boolean) (defaults to: false)

    when true, decide but don’t write

Returns:

  • (Hash)

    resolved:, decisions: [{conflict_id:, action:, keeper_id:]}



273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
# File 'lib/claude_memory/sweep/maintenance.rb', line 273

def dedupe_open_conflicts(dry_run: false)
  result = {inspected: 0, resolved: 0, decisions: []}

  open_rows = @store.conflicts
    .where(status: "open")
    .order(:id)
    .all
  return result if open_rows.empty?

  fact_ids = open_rows.flat_map { |r| [r[:fact_a_id], r[:fact_b_id]] }.uniq
  facts = @store.facts
    .where(id: fact_ids)
    .select(:id, :subject_entity_id, :predicate, :object_literal, :status)
    .all
    .to_h { |f| [f[:id], f] }

  @store.db.transaction do
    groups = open_rows.group_by { |row| pair_key(row, facts) }.reject { |key, _| key.nil? }
    groups.each_value do |rows_in_group|
      result[:inspected] += rows_in_group.size
      next if rows_in_group.size < 2

      keeper = rows_in_group.first
      duplicates = rows_in_group[1..]
      duplicates.each do |dup|
        result[:decisions] << {
          conflict_id: dup[:id],
          action: :resolve_duplicate,
          keeper_id: keeper[:id],
          duplicate_fact_id: dup[:fact_b_id]
        }
        # Counted whether or not we actually write, so dry-run output
        # matches real-run output and callers can compare plans.
        result[:resolved] += 1
        next if dry_run

        # Resolve the duplicate conflict. Also reject its disputed
        # side (fact_b_id is always the newer inserted-as-disputed
        # fact per Resolver convention), and shift its provenance
        # onto the keeper's fact_b so the evidence isn't lost.
        keeper_fact_b_id = keeper[:fact_b_id]
        if dup[:fact_b_id] != keeper_fact_b_id
          @store.provenance.where(fact_id: dup[:fact_b_id]).update(fact_id: keeper_fact_b_id)
          @store.facts.where(id: dup[:fact_b_id]).update(
            status: "rejected",
            valid_to: Time.now.utc.iso8601
          )
        end
        @store.conflicts.where(id: dup[:id]).update(
          status: "resolved",
          notes: "Deduplicated into conflict ##{keeper[:id]}"
        )
      end
    end
  end

  result
end

#expire_disputed_factsObject

Expire disputed facts older than TTL. Returns: Integer count of expired facts



42
43
44
45
46
47
48
# File 'lib/claude_memory/sweep/maintenance.rb', line 42

def expire_disputed_facts
  cutoff = cutoff_time(@config[:disputed_fact_ttl_days])
  @store.facts
    .where(status: "disputed")
    .where { created_at < cutoff }
    .update(status: "expired")
end

#expire_proposed_factsObject

Expire proposed facts older than TTL. Returns: Integer count of expired facts



32
33
34
35
36
37
38
# File 'lib/claude_memory/sweep/maintenance.rb', line 32

def expire_proposed_facts
  cutoff = cutoff_time(@config[:proposed_fact_ttl_days])
  @store.facts
    .where(status: "proposed")
    .where { created_at < cutoff }
    .update(status: "expired")
end

#fix_scope_leakageObject

Fix scope leakage: facts whose ‘scope` column disagrees with the store they live in. Pre-2026-04-20, the resolver treated scope_hint from the distiller as a scope override — so when the NullDistiller detected global-scope language (“always”, “my preference”), it stamped scope: “global” on facts that still ended up written to the project DB. The result was invisible orphaned rows: not in the global DB so global recall never saw them, but labeled global inside the project DB.

This pass detects those rows by comparing ‘scope` to the expected value derived from which DB this Maintenance instance is running against, and rewrites scope + project_path to match. Does not move facts between DBs — users can `claude-memory promote <id>` to do a proper cross-store copy. Returns: Integer count of facts whose scope was corrected.



108
109
110
111
112
113
114
115
116
# File 'lib/claude_memory/sweep/maintenance.rb', line 108

def fix_scope_leakage
  expected = expected_scope_for_store
  return 0 unless expected

  project_path_for_scope = (expected == "global") ? nil : detect_project_path
  @store.facts
    .exclude(scope: expected)
    .update(scope: expected, project_path: project_path_for_scope)
end

#prune_old_contentObject

Delete old content items not referenced by any provenance. Also removes their FTS index entries. Returns: Integer count of deleted content items



130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/claude_memory/sweep/maintenance.rb', line 130

def prune_old_content
  cutoff = cutoff_time(@config[:content_retention_days])
  referenced_ids = @store.provenance.exclude(content_item_id: nil).select(:content_item_id)
  prunable = @store.content_items
    .where { ingested_at < cutoff }
    .exclude(id: referenced_ids)

  fts = ClaudeMemory::Index::LexicalFTS.new(@store)
  prunable.select(:id, :raw_text).each do |row|
    fts.remove_content_item(row[:id], row[:raw_text])
  rescue
    # FTS entry may not exist; skip
  end

  prunable.delete
end

#prune_old_mcp_tool_callsObject

Delete MCP tool-call telemetry rows older than retention window. Returns: Integer count of deleted rows (0 if table missing).



245
246
247
248
249
250
# File 'lib/claude_memory/sweep/maintenance.rb', line 245

def prune_old_mcp_tool_calls
  return 0 unless @store.db.table_exists?(:mcp_tool_calls)

  cutoff = cutoff_time(@config[:mcp_tool_call_retention_days])
  @store.mcp_tool_calls.where { called_at < cutoff }.delete
end

#prune_orphaned_provenanceObject

Delete provenance records referencing non-existent facts. Returns: Integer count of deleted provenance rows



120
121
122
123
124
125
# File 'lib/claude_memory/sweep/maintenance.rb', line 120

def prune_orphaned_provenance
  fact_ids = @store.facts.select(:id)
  @store.provenance
    .exclude(fact_id: fact_ids)
    .delete
end

#reclassify_references(dry_run: false) ⇒ Hash

Reclassify active facts currently labeled ‘convention` whose object text matches the ReferenceMaterialDetector heuristics. Fixes the historical data tail from before the detector was wired into `store_extraction` on 2026-04-24. Current writes can’t create this pattern — this pass only cleans up what already exists.

Parameters:

  • dry_run (Boolean) (defaults to: false)

    when true, decide but don’t write

Returns:

  • (Hash)

    reclassified:, decisions: [{fact_id:, object:]}



340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
# File 'lib/claude_memory/sweep/maintenance.rb', line 340

def reclassify_references(dry_run: false)
  detector = ClaudeMemory::Distill::ReferenceMaterialDetector.new
  result = {inspected: 0, reclassified: 0, decisions: []}

  candidates = @store.facts
    .where(status: "active", predicate: "convention")
    .select(:id, :object_literal)
    .all

  @store.db.transaction do
    candidates.each do |row|
      result[:inspected] += 1
      fact = {predicate: "convention", object: row[:object_literal]}
      next unless detector.reference_material?(fact)

      result[:decisions] << {fact_id: row[:id], object: row[:object_literal]}
      result[:reclassified] += 1

      unless dry_run
        @store.facts.where(id: row[:id]).update(predicate: "reference")
      end
    end
  end

  result
end

#restore_multi_value_supersessions(predicate:, dry_run: false) ⇒ Hash

Restore superseded facts in a (subject, predicate) slot that were only superseded because of an obsolete single-value classification. Uses Jaccard-based token overlap to distinguish bug-superseded facts (token-disjoint siblings) from legitimate corrections (overlapping siblings).

Refuses to run on predicates still classified as single-value — they should stay superseded by design.

Never touches status: “rejected” facts (explicit user decisions).

Returns:

  • (Hash)

    restored, skipped_ambiguous, skipped_rejected, decisions



185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
# File 'lib/claude_memory/sweep/maintenance.rb', line 185

def restore_multi_value_supersessions(predicate:, dry_run: false)
  if ClaudeMemory::Resolve::PredicatePolicy.single?(predicate)
    raise ArgumentError, "Predicate '#{predicate}' is still classified single-value; refusing to restore"
  end

  result = {inspected: 0, restored: 0, skipped_ambiguous: 0, skipped_rejected: 0, decisions: []}

  rows_by_subject = @store.facts
    .where(predicate: predicate)
    .exclude(status: "rejected")
    .select(:id, :subject_entity_id, :object_literal, :status)
    .all
    .group_by { |r| r[:subject_entity_id] }

  rejected_by_subject = @store.facts
    .where(predicate: predicate, status: "rejected")
    .select(:id, :subject_entity_id, :object_literal)
    .all
    .group_by { |r| r[:subject_entity_id] }

  @store.db.transaction do
    rows_by_subject.each do |subject_id, rows|
      rejected_rows = rejected_by_subject[subject_id] || []
      siblings = rows + rejected_rows

      rows.each do |candidate|
        next unless candidate[:status] == "superseded"
        result[:inspected] += 1

        candidate_tokens = restore_tokenize(candidate[:object_literal])
        ambiguous_against = find_overlapping_siblings(candidate, siblings, candidate_tokens)

        if ambiguous_against.empty?
          result[:restored] += 1
          result[:decisions] << {
            subject_entity_id: subject_id,
            fact_id: candidate[:id],
            object: candidate[:object_literal],
            action: :restore
          }
          restore_fact!(candidate[:id]) unless dry_run
        else
          result[:skipped_ambiguous] += 1
          result[:decisions] << {
            subject_entity_id: subject_id,
            fact_id: candidate[:id],
            object: candidate[:object_literal],
            action: :skip_ambiguous,
            overlaps_with: ambiguous_against.map { |s| s[:object_literal] }
          }
        end
      end
    end
  end

  result
end

#vacuumObject

Run SQLite VACUUM to reclaim space. Returns: true



369
370
371
372
# File 'lib/claude_memory/sweep/maintenance.rb', line 369

def vacuum
  @store.db.run("VACUUM")
  true
end