Class: ClaudeMemory::Distill::ReferenceMaterialDetector
- Inherits:
-
Object
- Object
- ClaudeMemory::Distill::ReferenceMaterialDetector
- Defined in:
- lib/claude_memory/distill/reference_material_detector.rb
Overview
Guards against the LLM distiller mislabeling reference material as ‘convention`. Audited in production data on 2026-04-24: project facts labeled `predicate=convention` with objects like “Cloud-backed Claude Code plugin (~1,195 LOC JavaScript) using Supermemory API…” and “Claude Code plugin with marketplace.json, 5,700+ stars, by Tobi Lütke.” These are descriptions of external projects, not conventions the user applies. Leaving them under `convention` pollutes the Knowledge-base sidebar and the `memory.conventions` MCP tool.
Heuristic: only conventions are re-examined (decisions and architecture notes about external projects are legitimately those predicates). A convention is retagged to ‘reference` when its object text matches any of the descriptive patterns below. Kept deliberately conservative —false-positive retagging is worse than occasionally missing a case, so the patterns target telltale numeric/attribution phrases that rarely appear in real conventions.
Constant Summary collapse
- STRONG_PATTERNS =
Strong signals — any one of these on its own justifies reclassification. Kept tight to avoid false positives on real conventions that happen to quote external project names.
[ # Line-of-code counts: "~1,195 LOC", "1200 lines of code" /~?\d+[,.]?\d*\s*(?:LOC|lines of code)/i, # Star counts: "5,700+ stars", "3.2k stars" /\d[\d,.]*\+?\s*(?:k\s+)?stars?\b/i, # "X is a (plugin|library|tool|gem|service|framework|extension) …" /\b(?:is\s+an?|are)\s+(?:cloud-backed\s+)?(?:plugin|library|tool|gem|service|framework|extension|cli|mcp\s+server)\b/i, # Leading descriptor: "Plugin that…", "Library for…" /\A(?:cloud-backed\s+)?(?:plugin|library|tool|gem|service|framework|extension|cli|mcp\s+server)(?:\s+(?:with|using|for|that))/i ].freeze
- WEAK_PATTERNS =
Weak signals — only fire in combination with a strong signal. Author attribution (“by Jane Doe”) was originally a standalone trigger, but production text like “MCP launched by Claude Code run from PATH” contains the same surface pattern inside a legitimate convention. Requiring a co-occurring strong signal keeps the guard conservative.
[ /\bby\s+[[:upper:]][[:alpha:]'-]+\s+[[:upper:]][[:alpha:]'-]+/ ].freeze
- GUARDED_PREDICATES =
Predicates we inspect. Decisions stay decisions even when they cite external projects (“From QMD restudy: adopt X”); the guard targets only ‘convention`, where misclassification is most common.
%w[convention].freeze
Instance Method Summary collapse
Instance Method Details
#reclassify(extraction) ⇒ Object
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
# File 'lib/claude_memory/distill/reference_material_detector.rb', line 51 def reclassify(extraction) return extraction if extraction.facts.nil? || extraction.facts.empty? new_facts = extraction.facts.map do |fact| if reference_material?(fact) fact.merge(predicate: "reference") else fact end end Distill::Extraction.new( entities: extraction.entities, facts: new_facts, decisions: extraction.decisions, signals: extraction.signals ) end |
#reference_material?(fact) ⇒ Boolean
70 71 72 73 74 75 |
# File 'lib/claude_memory/distill/reference_material_detector.rb', line 70 def reference_material?(fact) return false unless GUARDED_PREDICATES.include?(fact[:predicate].to_s) object = fact[:object].to_s return false if object.empty? STRONG_PATTERNS.any? { |re| object.match?(re) } end |