Class: ClaudeMemory::Distill::ReferenceMaterialDetector

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/distill/reference_material_detector.rb

Overview

Guards against the LLM distiller mislabeling reference material as ‘convention`. Audited in production data on 2026-04-24: project facts labeled `predicate=convention` with objects like “Cloud-backed Claude Code plugin (~1,195 LOC JavaScript) using Supermemory API…” and “Claude Code plugin with marketplace.json, 5,700+ stars, by Tobi Lütke.” These are descriptions of external projects, not conventions the user applies. Leaving them under `convention` pollutes the Knowledge-base sidebar and the `memory.conventions` MCP tool.

Heuristic: only conventions are re-examined (decisions and architecture notes about external projects are legitimately those predicates). A convention is retagged to ‘reference` when its object text matches any of the descriptive patterns below. Kept deliberately conservative —false-positive retagging is worse than occasionally missing a case, so the patterns target telltale numeric/attribution phrases that rarely appear in real conventions.

Constant Summary collapse

STRONG_PATTERNS =

Strong signals — any one of these on its own justifies reclassification. Kept tight to avoid false positives on real conventions that happen to quote external project names.

[
  # Line-of-code counts: "~1,195 LOC", "1200 lines of code"
  /~?\d+[,.]?\d*\s*(?:LOC|lines of code)/i,
  # Star counts: "5,700+ stars", "3.2k stars"
  /\d[\d,.]*\+?\s*(?:k\s+)?stars?\b/i,
  # "X is a (plugin|library|tool|gem|service|framework|extension) …"
  /\b(?:is\s+an?|are)\s+(?:cloud-backed\s+)?(?:plugin|library|tool|gem|service|framework|extension|cli|mcp\s+server)\b/i,
  # Leading descriptor: "Plugin that…", "Library for…"
  /\A(?:cloud-backed\s+)?(?:plugin|library|tool|gem|service|framework|extension|cli|mcp\s+server)(?:\s+(?:with|using|for|that))/i
].freeze
WEAK_PATTERNS =

Weak signals — only fire in combination with a strong signal. Author attribution (“by Jane Doe”) was originally a standalone trigger, but production text like “MCP launched by Claude Code run from PATH” contains the same surface pattern inside a legitimate convention. Requiring a co-occurring strong signal keeps the guard conservative.

[
  /\bby\s+[[:upper:]][[:alpha:]'-]+\s+[[:upper:]][[:alpha:]'-]+/
].freeze
GUARDED_PREDICATES =

Predicates we inspect. Decisions stay decisions even when they cite external projects (“From QMD restudy: adopt X”); the guard targets only ‘convention`, where misclassification is most common.

%w[convention].freeze

Instance Method Summary collapse

Instance Method Details

#reclassify(extraction) ⇒ Object



51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# File 'lib/claude_memory/distill/reference_material_detector.rb', line 51

def reclassify(extraction)
  return extraction if extraction.facts.nil? || extraction.facts.empty?

  new_facts = extraction.facts.map do |fact|
    if reference_material?(fact)
      fact.merge(predicate: "reference")
    else
      fact
    end
  end

  Distill::Extraction.new(
    entities: extraction.entities,
    facts: new_facts,
    decisions: extraction.decisions,
    signals: extraction.signals
  )
end

#reference_material?(fact) ⇒ Boolean

Returns:

  • (Boolean)


70
71
72
73
74
75
# File 'lib/claude_memory/distill/reference_material_detector.rb', line 70

def reference_material?(fact)
  return false unless GUARDED_PREDICATES.include?(fact[:predicate].to_s)
  object = fact[:object].to_s
  return false if object.empty?
  STRONG_PATTERNS.any? { |re| object.match?(re) }
end