Class: ClaudeMemory::Sweep::Maintenance

Inherits:

Object

Object
ClaudeMemory::Sweep::Maintenance

show all

Defined in:: lib/claude_memory/sweep/maintenance.rb

Overview

Clean separation of individual maintenance operations from Sweeper’s budget-management orchestration. Each method performs a single operation and returns the count of affected records.

Source: QMD v2.0.1 Maintenance class pattern

Constant Summary collapse

RESTORE_STOPWORDS = Short / noise tokens dropped before Jaccard comparison. Intentionally minimal — we want conservative token extraction that still treats “Rails 8.0” and “Rails 8.1” as overlapping.

%w[for the and with via of in on to by is are].to_set.freeze

RESTORE_JACCARD_THRESHOLD =

0.5

DEFAULT_CONFIG =

{
  proposed_fact_ttl_days: 14,
  disputed_fact_ttl_days: 30,
  content_retention_days: 30,
  mcp_tool_call_retention_days: 90,
  otel_metric_retention_days: 30,
  otel_event_retention_days: 14,
  otel_trace_retention_days: 7,
  observation_info_ttl_days: 30
}.freeze

Instance Attribute Summary collapse

#store ⇒ Object readonly

Returns the value of attribute store.

Instance Method Summary collapse

#backfill_vec_index(limit: 100) ⇒ Object

Backfill vector index for unindexed facts.
#checkpoint_wal ⇒ Object

Checkpoint the SQLite WAL file for compaction.
#cleanup_vec_expired(limit: 100) ⇒ Object

Remove vector embeddings for superseded/expired facts.
#dedupe_multi_value_facts ⇒ Object

Collapse duplicate multi-value facts.
#dedupe_open_conflicts(dry_run: false) ⇒ Hash

Deduplicate open conflicts that describe the same contradiction.
#expire_disputed_facts ⇒ Object

Expire disputed facts older than TTL.
#expire_proposed_facts ⇒ Object

Expire proposed facts older than TTL.
#fix_scope_leakage ⇒ Object

Fix scope leakage: facts whose ‘scope` column disagrees with the store they live in.
#initialize(store, config: {}) ⇒ Maintenance constructor

A new instance of Maintenance.
#prune_old_content ⇒ Object

Delete old content items not referenced by any provenance.
#prune_old_mcp_tool_calls ⇒ Object

Delete MCP tool-call telemetry rows older than retention window.
#prune_old_otel_events ⇒ Object

Delete OTel log-style events older than retention window.
#prune_old_otel_metrics ⇒ Object

Delete OTel metric data points older than retention window.
#prune_old_otel_traces ⇒ Object

Delete OTel trace spans older than retention window.
#prune_orphaned_provenance ⇒ Object

Delete provenance records referencing non-existent facts.
#reclassify_references(dry_run: false) ⇒ Hash

Reclassify active facts currently labeled ‘convention` whose object text matches the ReferenceMaterialDetector heuristics.
#reflect_observations ⇒ Object

Run the deterministic observation Reflector (dedupe near-identical observations + expire stale info-level ones).
#restore_multi_value_supersessions(predicate:, dry_run: false) ⇒ Hash

Restore superseded facts in a (subject, predicate) slot that were only superseded because of an obsolete single-value classification.
#vacuum ⇒ Object

Run SQLite VACUUM to reclaim space.

Constructor Details

#initialize(store, config: {}) ⇒ `Maintenance`

Returns a new instance of Maintenance.

# File 'lib/claude_memory/sweep/maintenance.rb', line 29

def initialize(store, config: {})
  @store = store
  @config = DEFAULT_CONFIG.merge(config)
end

Instance Attribute Details

#store ⇒ `Object` (readonly)

Returns the value of attribute store.



27
28
29

# File 'lib/claude_memory/sweep/maintenance.rb', line 27

def store
  @store
end

Instance Method Details

#backfill_vec_index(limit: 100) ⇒ `Object`

Backfill vector index for unindexed facts. Returns: Integer count of backfilled embeddings (0 if unavailable)

# File 'lib/claude_memory/sweep/maintenance.rb', line 153

def backfill_vec_index(limit: 100)
  with_vec_index do |vec_index|
    return vec_index.backfill_batch!(limit: limit)
  end
  0
end

#checkpoint_wal ⇒ `Object`

Checkpoint the SQLite WAL file for compaction. Returns: true

# File 'lib/claude_memory/sweep/maintenance.rb', line 285

def checkpoint_wal
  @store.checkpoint_wal
  true
end

#cleanup_vec_expired(limit: 100) ⇒ `Object`

Remove vector embeddings for superseded/expired facts. Returns: Integer count of cleaned embeddings (0 if unavailable)

# File 'lib/claude_memory/sweep/maintenance.rb', line 162

def cleanup_vec_expired(limit: 100)
  with_vec_index do |vec_index|
    stale_ids = @store.facts
      .where(status: %w[superseded expired])
      .where(Sequel.~(vec_indexed_at: nil))
      .select(:id)
      .limit(limit)
      .map { |r| r[:id] }

    stale_ids.each { |fact_id| vec_index.remove_embedding(fact_id) }
    return stale_ids.size
  end
  0
end

#dedupe_multi_value_facts ⇒ `Object`

Collapse duplicate multi-value facts. Before the resolver-level dedup fix (2026-04-17), multi-value predicates like uses_language and uses_framework accumulated identical rows every ingest cycle. For each (subject_entity_id, predicate, object_literal, scope) group with more than one active fact, keep the oldest row, copy the duplicates’ provenance onto the keeper (so we retain source signal), and mark the duplicates superseded. Returns the count of fact rows merged into their keeper.

# File 'lib/claude_memory/sweep/maintenance.rb', line 62

def dedupe_multi_value_facts
  merged = 0
  @store.db.transaction do
    # Pull every active fact with a literal object and group in Ruby.
    # Facts tables stay small (< 10k typical); Sequel's HAVING COUNT(*)
    # path hits adapter quoting bugs on some Extralite versions.
    active = @store.facts
      .where(status: "active")
      .exclude(subject_entity_id: nil)
      .exclude(object_literal: nil)
      .order(:id)
      .all

    groups = active.group_by { |f|
      [f[:subject_entity_id], f[:predicate], f[:object_literal]&.downcase, f[:scope]]
    }

    groups.each_value do |rows|
      next if rows.size < 2

      keeper = rows.first
      rows[1..].each do |loser|
        @store.provenance.where(fact_id: loser[:id]).update(fact_id: keeper[:id])
        @store.facts.where(id: loser[:id]).update(
          status: "superseded",
          valid_to: Time.now.utc.iso8601
        )
        @store.insert_fact_link(from_fact_id: keeper[:id], to_fact_id: loser[:id], link_type: "supersedes")
        merged += 1
      end
    end
  end
  merged
end

#dedupe_open_conflicts(dry_run: false) ⇒ `Hash`

Deduplicate open conflicts that describe the same contradiction. Before the Resolver#apply_conflict dedupe fix (2026-04-24), each re-extraction of the losing value in a single-value slot produced a new disputed fact + conflict row — production DBs accumulated 11 open conflicts for “sqlite vs postgresql” referencing 11 different disputed facts. This pass keeps the earliest conflict per logical pair and marks the rest resolved, reinforcing the keeper’s provenance chain with the duplicates’ provenance.

Pair key: (subject_entity_id, predicate, normalized(object_a), normalized(object_b)) with object order sorted so A-vs-B == B-vs-A.

Parameters:

dry_run (Boolean) (defaults to: false) —

when true, decide but don’t write

Returns:

(Hash) —

resolved:, decisions: [{conflict_id:, action:, keeper_id:]}

# File 'lib/claude_memory/sweep/maintenance.rb', line 304

def dedupe_open_conflicts(dry_run: false)
  result = {inspected: 0, resolved: 0, decisions: []}

  open_rows = @store.conflicts
    .where(status: "open")
    .order(:id)
    .all
  return result if open_rows.empty?

  fact_ids = open_rows.flat_map { |r| [r[:fact_a_id], r[:fact_b_id]] }.uniq
  facts = @store.facts
    .where(id: fact_ids)
    .select(:id, :subject_entity_id, :predicate, :object_literal, :status)
    .all
    .to_h { |f| [f[:id], f] }

  @store.db.transaction do
    groups = open_rows.group_by { |row| pair_key(row, facts) }.reject { |key, _| key.nil? }
    groups.each_value do |rows_in_group|
      result[:inspected] += rows_in_group.size
      next if rows_in_group.size < 2

      keeper = rows_in_group.first
      duplicates = rows_in_group[1..]
      duplicates.each do |dup|
        result[:decisions] << {
          conflict_id: dup[:id],
          action: :resolve_duplicate,
          keeper_id: keeper[:id],
          duplicate_fact_id: dup[:fact_b_id]
        }
        # Counted whether or not we actually write, so dry-run output
        # matches real-run output and callers can compare plans.
        result[:resolved] += 1
        next if dry_run

        # Resolve the duplicate conflict. Also reject its disputed
        # side (fact_b_id is always the newer inserted-as-disputed
        # fact per Resolver convention), and shift its provenance
        # onto the keeper's fact_b so the evidence isn't lost.
        keeper_fact_b_id = keeper[:fact_b_id]
        if dup[:fact_b_id] != keeper_fact_b_id
          @store.provenance.where(fact_id: dup[:fact_b_id]).update(fact_id: keeper_fact_b_id)
          @store.facts.where(id: dup[:fact_b_id]).update(
            status: "rejected",
            valid_to: Time.now.utc.iso8601
          )
        end
        @store.conflicts.where(id: dup[:id]).update(
          status: "resolved",
          notes: "Deduplicated into conflict ##{keeper[:id]}"
        )
      end
    end
  end

  result
end

#expire_disputed_facts ⇒ `Object`

Expire disputed facts older than TTL. Returns: Integer count of expired facts

# File 'lib/claude_memory/sweep/maintenance.rb', line 46

def expire_disputed_facts
  cutoff = cutoff_time(@config[:disputed_fact_ttl_days])
  @store.facts
    .where(status: "disputed")
    .where { created_at < cutoff }
    .update(status: "expired")
end

#expire_proposed_facts ⇒ `Object`

Expire proposed facts older than TTL. Returns: Integer count of expired facts

# File 'lib/claude_memory/sweep/maintenance.rb', line 36

def expire_proposed_facts
  cutoff = cutoff_time(@config[:proposed_fact_ttl_days])
  @store.facts
    .where(status: "proposed")
    .where { created_at < cutoff }
    .update(status: "expired")
end

#fix_scope_leakage ⇒ `Object`

Fix scope leakage: facts whose ‘scope` column disagrees with the store they live in. Pre-2026-04-20, the resolver treated scope_hint from the distiller as a scope override — so when the NullDistiller detected global-scope language (“always”, “my preference”), it stamped scope: “global” on facts that still ended up written to the project DB. The result was invisible orphaned rows: not in the global DB so global recall never saw them, but labeled global inside the project DB.

This pass detects those rows by comparing ‘scope` to the expected value derived from which DB this Maintenance instance is running against, and rewrites scope + project_path to match. Does not move facts between DBs — users can `claude-memory promote <id>` to do a proper cross-store copy. Returns: Integer count of facts whose scope was corrected.

# File 'lib/claude_memory/sweep/maintenance.rb', line 112

def fix_scope_leakage
  expected = expected_scope_for_store
  return 0 unless expected

  project_path_for_scope = (expected == "global") ? nil : detect_project_path
  @store.facts
    .exclude(scope: expected)
    .update(scope: expected, project_path: project_path_for_scope)
end

#prune_old_content ⇒ `Object`

Delete old content items not referenced by any provenance. Also removes their FTS index entries. Returns: Integer count of deleted content items

# File 'lib/claude_memory/sweep/maintenance.rb', line 134

def prune_old_content
  cutoff = cutoff_time(@config[:content_retention_days])
  referenced_ids = @store.provenance.exclude(content_item_id: nil).select(:content_item_id)
  prunable = @store.content_items
    .where { ingested_at < cutoff }
    .exclude(id: referenced_ids)

  fts = ClaudeMemory::Index::LexicalFTS.new(@store)
  prunable.select(:id, :raw_text).each do |row|
    fts.remove_content_item(row[:id], row[:raw_text])
  rescue
    # FTS entry may not exist; skip
  end

  prunable.delete
end

#prune_old_mcp_tool_calls ⇒ `Object`

Delete MCP tool-call telemetry rows older than retention window. Returns: Integer count of deleted rows (0 if table missing).

# File 'lib/claude_memory/sweep/maintenance.rb', line 249

def prune_old_mcp_tool_calls
  return 0 unless @store.db.table_exists?(:mcp_tool_calls)

  cutoff = cutoff_time(@config[:mcp_tool_call_retention_days])
  @store.mcp_tool_calls.where { called_at < cutoff }.delete
end

#prune_old_otel_events ⇒ `Object`

Delete OTel log-style events older than retention window. Returns: Integer count of deleted rows (0 if table missing).

# File 'lib/claude_memory/sweep/maintenance.rb', line 267

def prune_old_otel_events
  return 0 unless @store.db.table_exists?(:otel_events)

  cutoff = cutoff_time(@config[:otel_event_retention_days])
  @store.otel_events.where { occurred_at < cutoff }.delete
end

#prune_old_otel_metrics ⇒ `Object`

Delete OTel metric data points older than retention window. Returns: Integer count of deleted rows (0 if table missing).

# File 'lib/claude_memory/sweep/maintenance.rb', line 258

def prune_old_otel_metrics
  return 0 unless @store.db.table_exists?(:otel_metrics)

  cutoff = cutoff_time(@config[:otel_metric_retention_days])
  @store.otel_metrics.where { recorded_at < cutoff }.delete
end

#prune_old_otel_traces ⇒ `Object`

Delete OTel trace spans older than retention window. Returns: Integer count of deleted rows (0 if table missing).

# File 'lib/claude_memory/sweep/maintenance.rb', line 276

def prune_old_otel_traces
  return 0 unless @store.db.table_exists?(:otel_traces)

  cutoff = cutoff_time(@config[:otel_trace_retention_days])
  @store.otel_traces.where { recorded_at < cutoff }.delete
end

#prune_orphaned_provenance ⇒ `Object`

Delete provenance records referencing non-existent facts. Returns: Integer count of deleted provenance rows

# File 'lib/claude_memory/sweep/maintenance.rb', line 124

def prune_orphaned_provenance
  fact_ids = @store.facts.select(:id)
  @store.provenance
    .exclude(fact_id: fact_ids)
    .delete
end

#reclassify_references(dry_run: false) ⇒ `Hash`

Reclassify active facts currently labeled ‘convention` whose object text matches the ReferenceMaterialDetector heuristics. Fixes the historical data tail from before the detector was wired into `store_extraction` on 2026-04-24. Current writes can’t create this pattern — this pass only cleans up what already exists.

Parameters:

dry_run (Boolean) (defaults to: false) —

when true, decide but don’t write

Returns:

(Hash) —

reclassified:, decisions: [{fact_id:, object:]}

# File 'lib/claude_memory/sweep/maintenance.rb', line 371

def reclassify_references(dry_run: false)
  detector = ClaudeMemory::Distill::ReferenceMaterialDetector.new
  result = {inspected: 0, reclassified: 0, decisions: []}

  candidates = @store.facts
    .where(status: "active", predicate: "convention")
    .select(:id, :object_literal)
    .all

  @store.db.transaction do
    candidates.each do |row|
      result[:inspected] += 1
      fact = {predicate: "convention", object: row[:object_literal]}
      next unless detector.reference_material?(fact)

      result[:decisions] << {fact_id: row[:id], object: row[:object_literal]}
      result[:reclassified] += 1

      unless dry_run
        @store.facts.where(id: row[:id]).update(predicate: "reference")
      end
    end
  end

  result
end

#reflect_observations ⇒ `Object`

Run the deterministic observation Reflector (dedupe near-identical observations + expire stale info-level ones). Free, no LLM —provenance-preserving (tombstone, never delete). Returns: Hash expired:

# File 'lib/claude_memory/sweep/maintenance.rb', line 402

def reflect_observations
  return {deduped: 0, expired: 0} unless @store.db.table_exists?(:observations)

  result = ClaudeMemory::Observe::Reflector.new(
    @store, info_ttl_days: @config[:observation_info_ttl_days]
  ).reflect!
  {deduped: result.deduped, expired: result.expired}
end

#restore_multi_value_supersessions(predicate:, dry_run: false) ⇒ `Hash`

Restore superseded facts in a (subject, predicate) slot that were only superseded because of an obsolete single-value classification. Uses Jaccard-based token overlap to distinguish bug-superseded facts (token-disjoint siblings) from legitimate corrections (overlapping siblings).

Refuses to run on predicates still classified as single-value — they should stay superseded by design.

Never touches status: “rejected” facts (explicit user decisions).

Returns:

(Hash) —

restored, skipped_ambiguous, skipped_rejected, decisions

# File 'lib/claude_memory/sweep/maintenance.rb', line 189

def restore_multi_value_supersessions(predicate:, dry_run: false)
  if ClaudeMemory::Resolve::PredicatePolicy.single?(predicate)
    raise ArgumentError, "Predicate '#{predicate}' is still classified single-value; refusing to restore"
  end

  result = {inspected: 0, restored: 0, skipped_ambiguous: 0, skipped_rejected: 0, decisions: []}

  rows_by_subject = @store.facts
    .where(predicate: predicate)
    .exclude(status: "rejected")
    .select(:id, :subject_entity_id, :object_literal, :status)
    .all
    .group_by { |r| r[:subject_entity_id] }

  rejected_by_subject = @store.facts
    .where(predicate: predicate, status: "rejected")
    .select(:id, :subject_entity_id, :object_literal)
    .all
    .group_by { |r| r[:subject_entity_id] }

  @store.db.transaction do
    rows_by_subject.each do |subject_id, rows|
      rejected_rows = rejected_by_subject[subject_id] || []
      siblings = rows + rejected_rows

      rows.each do |candidate|
        next unless candidate[:status] == "superseded"
        result[:inspected] += 1

        candidate_tokens = restore_tokenize(candidate[:object_literal])
        ambiguous_against = find_overlapping_siblings(candidate, siblings, candidate_tokens)

        if ambiguous_against.empty?
          result[:restored] += 1
          result[:decisions] << {
            subject_entity_id: subject_id,
            fact_id: candidate[:id],
            object: candidate[:object_literal],
            action: :restore
          }
          restore_fact!(candidate[:id]) unless dry_run
        else
          result[:skipped_ambiguous] += 1
          result[:decisions] << {
            subject_entity_id: subject_id,
            fact_id: candidate[:id],
            object: candidate[:object_literal],
            action: :skip_ambiguous,
            overlaps_with: ambiguous_against.map { |s| s[:object_literal] }
          }
        end
      end
    end
  end

  result
end

#vacuum ⇒ `Object`

Run SQLite VACUUM to reclaim space. Returns: true

# File 'lib/claude_memory/sweep/maintenance.rb', line 413

def vacuum
  @store.db.run("VACUUM")
  true
end

Class: ClaudeMemory::Sweep::Maintenance

Overview

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(store, config: {}) ⇒ Maintenance

Instance Attribute Details

#store ⇒ Object (readonly)

Instance Method Details

#backfill_vec_index(limit: 100) ⇒ Object

#checkpoint_wal ⇒ Object

#cleanup_vec_expired(limit: 100) ⇒ Object

#dedupe_multi_value_facts ⇒ Object

#dedupe_open_conflicts(dry_run: false) ⇒ Hash

#expire_disputed_facts ⇒ Object

#expire_proposed_facts ⇒ Object

#fix_scope_leakage ⇒ Object

#prune_old_content ⇒ Object

#prune_old_mcp_tool_calls ⇒ Object

#prune_old_otel_events ⇒ Object

#prune_old_otel_metrics ⇒ Object

#prune_old_otel_traces ⇒ Object

#prune_orphaned_provenance ⇒ Object

#reclassify_references(dry_run: false) ⇒ Hash

#reflect_observations ⇒ Object

#restore_multi_value_supersessions(predicate:, dry_run: false) ⇒ Hash

#vacuum ⇒ Object

#initialize(store, config: {}) ⇒ `Maintenance`

#store ⇒ `Object` (readonly)

#backfill_vec_index(limit: 100) ⇒ `Object`

#checkpoint_wal ⇒ `Object`

#cleanup_vec_expired(limit: 100) ⇒ `Object`

#dedupe_multi_value_facts ⇒ `Object`

#dedupe_open_conflicts(dry_run: false) ⇒ `Hash`

#expire_disputed_facts ⇒ `Object`

#expire_proposed_facts ⇒ `Object`

#fix_scope_leakage ⇒ `Object`

#prune_old_content ⇒ `Object`

#prune_old_mcp_tool_calls ⇒ `Object`

#prune_old_otel_events ⇒ `Object`

#prune_old_otel_metrics ⇒ `Object`

#prune_old_otel_traces ⇒ `Object`

#prune_orphaned_provenance ⇒ `Object`

#reclassify_references(dry_run: false) ⇒ `Hash`

#reflect_observations ⇒ `Object`

#restore_multi_value_supersessions(predicate:, dry_run: false) ⇒ `Hash`

#vacuum ⇒ `Object`