Class: ClaudeMemory::Distill::ReferenceMaterialDetector

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/distill/reference_material_detector.rb

Overview

Guards against the LLM distiller mislabeling reference material as ‘convention`. Audited in production data on 2026-04-24: project facts labeled `predicate=convention` with objects like “Cloud-backed Claude Code plugin (~1,195 LOC JavaScript) using Supermemory API…” and “Claude Code plugin with marketplace.json, 5,700+ stars, by Tobi Lütke.” These are descriptions of external projects, not conventions the user applies. Leaving them under `convention` pollutes the Knowledge-base sidebar and the `memory.conventions` MCP tool.

Heuristic: only conventions are re-examined (decisions and architecture notes about external projects are legitimately those predicates). A convention is retagged to ‘reference` when its object text matches any of the descriptive patterns below. Kept deliberately conservative —false-positive retagging is worse than occasionally missing a case, so the patterns target telltale numeric/attribution phrases that rarely appear in real conventions.

Constant Summary collapse

STRONG_PATTERNS =

Strong signals — any one of these on its own justifies reclassification. Kept tight to avoid false positives on real conventions that happen to quote external project names.

[
  # Line-of-code counts: "~1,195 LOC", "1200 lines of code"
  /~?\d+[,.]?\d*\s*(?:LOC|lines of code)/i,
  # Star counts: "5,700+ stars", "3.2k stars"
  /\d[\d,.]*\+?\s*(?:k\s+)?stars?\b/i,
  # "X is a (plugin|library|tool|gem|service|framework|extension) …"
  /\b(?:is\s+an?|are)\s+(?:cloud-backed\s+)?(?:plugin|library|tool|gem|service|framework|extension|cli|mcp\s+server)\b/i,
  # Leading descriptor: "Plugin that…", "Library for…"
  /\A(?:cloud-backed\s+)?(?:plugin|library|tool|gem|service|framework|extension|cli|mcp\s+server)(?:\s+(?:with|using|for|that))/i
].freeze
WEAK_PATTERNS =

Weak signals — only fire in combination with a strong signal. Author attribution (“by Jane Doe”) was originally a standalone trigger, but production text like “MCP launched by Claude Code run from PATH” contains the same surface pattern inside a legitimate convention. Requiring a co-occurring strong signal keeps the guard conservative.

[
  /\bby\s+[[:upper:]][[:alpha:]'-]+\s+[[:upper:]][[:alpha:]'-]+/
].freeze
GUARDED_PREDICATES =

Predicates inspected for object-text reference signals. Decisions stay decisions even when they cite external projects (“From QMD restudy: adopt X”); the object-text guard targets only ‘convention`, where misclassification is most common.

%w[convention].freeze
QUOTE_GUARDED_PREDICATES =

Stack-shaping single-value predicates that historically attract hallucinations from CLAUDE.md-style example text (“e.g., this app uses PostgreSQL”). For these predicates we additionally inspect the source quote for example markers — if the LLM extracted a stack fact from documentation example text, it’s not a real project commitment. Added 2026-05-21 after the audit found 10 open conflicts driven by recurring example-text extraction.

%w[uses_database uses_framework uses_language deployment_platform auth_method].freeze
EXAMPLE_QUOTE_PATTERNS =

Example markers that signal the source text is documentation exemplifying a scope/predicate concept, not a real stack claim.

[
  /\b(?:e\.?g\.?|i\.?e\.?|for example|for instance|such as)[,:]?\s/i,
  /\(\s*(?:e\.?g\.?|i\.?e\.?)[,.]/i
].freeze

Instance Method Summary collapse

Instance Method Details

#reclassify(extraction) ⇒ Object



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/claude_memory/distill/reference_material_detector.rb', line 68

def reclassify(extraction)
  return extraction if extraction.facts.nil? || extraction.facts.empty?

  new_facts = extraction.facts.map do |fact|
    if reference_material?(fact)
      fact.merge(predicate: "reference")
    else
      fact
    end
  end

  Distill::Extraction.new(
    entities: extraction.entities,
    facts: new_facts,
    decisions: extraction.decisions,
    signals: extraction.signals
  )
end

#reference_material?(fact) ⇒ Boolean

Returns:

  • (Boolean)


87
88
89
90
91
92
# File 'lib/claude_memory/distill/reference_material_detector.rb', line 87

def reference_material?(fact)
  predicate = fact[:predicate].to_s
  return true if convention_with_reference_object?(fact, predicate)
  return true if stack_predicate_from_example_text?(fact, predicate)
  false
end