Class: ClaudeMemory::Observe::TokenOverlapMatcher
- Inherits:
-
Object
- Object
- ClaudeMemory::Observe::TokenOverlapMatcher
- Defined in:
- lib/claude_memory/observe/token_overlap_matcher.rb
Overview
Default observation-similarity matcher: lexical token-overlap (Jaccard).
Deterministic, free, no embedding dependency — so it runs shell-side in the Reflector’s sweep pass at no extra cost. Two bodies are “the same sighting” when their significant-word sets overlap past the threshold. This folds the common case (one event re-observed with slightly different wording — “PreCompact hook set.” / “PreCompact hook set — the design analog”, Jaccard 0.6) while keeping unrelated observations apart (distinct developer statements share ~no content words → Jaccard ~0).
It does NOT capture pure synonym paraphrases (“use SQLite” vs “chose SQLite”) — no free lexical method can on short text (measured 2026-06-23: tfidf cosine 0.32 for that pair, indistinguishable from unrelated pairs at 0.13). For paraphrase folding, inject a semantic matcher backed by real embeddings: the Reflector accepts any object responding to ‘similar?(body_a, body_b)`.
Constant Summary collapse
- DEFAULT_THRESHOLD =
0.5- STOPWORDS =
Function words carry no episodic signal; dropping them focuses the overlap on subject/verb content.
%w[ a an the to of in on at for and or but we i it is are was were be been this that these those with as by from into our your their its do does ].to_set.freeze
Instance Method Summary collapse
-
#initialize(threshold: DEFAULT_THRESHOLD) ⇒ TokenOverlapMatcher
constructor
A new instance of TokenOverlapMatcher.
-
#similar?(body_a, body_b) ⇒ Boolean
True when the two bodies are near-duplicate sightings.
Constructor Details
#initialize(threshold: DEFAULT_THRESHOLD) ⇒ TokenOverlapMatcher
Returns a new instance of TokenOverlapMatcher.
31 32 33 |
# File 'lib/claude_memory/observe/token_overlap_matcher.rb', line 31 def initialize(threshold: DEFAULT_THRESHOLD) @threshold = threshold end |
Instance Method Details
#similar?(body_a, body_b) ⇒ Boolean
Returns true when the two bodies are near-duplicate sightings.
36 37 38 39 40 41 42 43 44 |
# File 'lib/claude_memory/observe/token_overlap_matcher.rb', line 36 def similar?(body_a, body_b) a = significant_tokens(body_a) b = significant_tokens(body_b) return false if a.empty? || b.empty? intersection = (a & b).size.to_f union = (a | b).size (intersection / union) >= @threshold end |