Class: ClaudeMemory::Distill::ReferenceMaterialDetector
- Inherits:
-
Object
- Object
- ClaudeMemory::Distill::ReferenceMaterialDetector
- Defined in:
- lib/claude_memory/distill/reference_material_detector.rb
Overview
Guards against the LLM distiller mislabeling reference material as ‘convention`. Audited in production data on 2026-04-24: project facts labeled `predicate=convention` with objects like “Cloud-backed Claude Code plugin (~1,195 LOC JavaScript) using Supermemory API…” and “Claude Code plugin with marketplace.json, 5,700+ stars, by Tobi Lütke.” These are descriptions of external projects, not conventions the user applies. Leaving them under `convention` pollutes the Knowledge-base sidebar and the `memory.conventions` MCP tool.
Heuristic: only conventions are re-examined (decisions and architecture notes about external projects are legitimately those predicates). A convention is retagged to ‘reference` when its object text matches any of the descriptive patterns below. Kept deliberately conservative —false-positive retagging is worse than occasionally missing a case, so the patterns target telltale numeric/attribution phrases that rarely appear in real conventions.
Constant Summary collapse
- STRONG_PATTERNS =
Strong signals — any one of these on its own justifies reclassification. Kept tight to avoid false positives on real conventions that happen to quote external project names.
[ # Line-of-code counts: "~1,195 LOC", "1200 lines of code" /~?\d+[,.]?\d*\s*(?:LOC|lines of code)/i, # Star counts: "5,700+ stars", "3.2k stars" /\d[\d,.]*\+?\s*(?:k\s+)?stars?\b/i, # "X is a (plugin|library|tool|gem|service|framework|extension) …" /\b(?:is\s+an?|are)\s+(?:cloud-backed\s+)?(?:plugin|library|tool|gem|service|framework|extension|cli|mcp\s+server)\b/i, # Leading descriptor: "Plugin that…", "Library for…" /\A(?:cloud-backed\s+)?(?:plugin|library|tool|gem|service|framework|extension|cli|mcp\s+server)(?:\s+(?:with|using|for|that))/i ].freeze
- WEAK_PATTERNS =
Weak signals — only fire in combination with a strong signal. Author attribution (“by Jane Doe”) was originally a standalone trigger, but production text like “MCP launched by Claude Code run from PATH” contains the same surface pattern inside a legitimate convention. Requiring a co-occurring strong signal keeps the guard conservative.
[ /\bby\s+[[:upper:]][[:alpha:]'-]+\s+[[:upper:]][[:alpha:]'-]+/ ].freeze
- GUARDED_PREDICATES =
Predicates inspected for object-text reference signals. Decisions stay decisions even when they cite external projects (“From QMD restudy: adopt X”); the object-text guard targets only ‘convention`, where misclassification is most common.
%w[convention].freeze
- QUOTE_GUARDED_PREDICATES =
Stack-shaping single-value predicates that historically attract hallucinations from CLAUDE.md-style example text (“e.g., this app uses PostgreSQL”). For these predicates we additionally inspect the source quote for example markers — if the LLM extracted a stack fact from documentation example text, it’s not a real project commitment. Added 2026-05-21 after the audit found 10 open conflicts driven by recurring example-text extraction.
%w[uses_database uses_framework uses_language deployment_platform auth_method].freeze
- EXAMPLE_QUOTE_PATTERNS =
Example markers that signal the source text is documentation exemplifying a scope/predicate concept, not a real stack claim.
[ /\b(?:e\.?g\.?|i\.?e\.?|for example|for instance|such as)[,:]?\s/i, /\(\s*(?:e\.?g\.?|i\.?e\.?)[,.]/i ].freeze
Instance Method Summary collapse
Instance Method Details
#reclassify(extraction) ⇒ Object
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/claude_memory/distill/reference_material_detector.rb', line 68 def reclassify(extraction) return extraction if extraction.facts.nil? || extraction.facts.empty? new_facts = extraction.facts.map do |fact| if reference_material?(fact) fact.merge(predicate: "reference") else fact end end Distill::Extraction.new( entities: extraction.entities, facts: new_facts, decisions: extraction.decisions, signals: extraction.signals ) end |
#reference_material?(fact) ⇒ Boolean
87 88 89 90 91 92 |
# File 'lib/claude_memory/distill/reference_material_detector.rb', line 87 def reference_material?(fact) predicate = fact[:predicate].to_s return true if convention_with_reference_object?(fact, predicate) return true if stack_predicate_from_example_text?(fact, predicate) false end |