Class: ClaudeMemory::Observe::TokenOverlapMatcher

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/observe/token_overlap_matcher.rb

Overview

Default observation-similarity matcher: lexical token-overlap (Jaccard).

Deterministic, free, no embedding dependency — so it runs shell-side in the Reflector’s sweep pass at no extra cost. Two bodies are “the same sighting” when their significant-word sets overlap past the threshold. This folds the common case (one event re-observed with slightly different wording — “PreCompact hook set.” / “PreCompact hook set — the design analog”, Jaccard 0.6) while keeping unrelated observations apart (distinct developer statements share ~no content words → Jaccard ~0).

It does NOT capture pure synonym paraphrases (“use SQLite” vs “chose SQLite”) — no free lexical method can on short text (measured 2026-06-23: tfidf cosine 0.32 for that pair, indistinguishable from unrelated pairs at 0.13). For paraphrase folding, inject a semantic matcher backed by real embeddings: the Reflector accepts any object responding to ‘similar?(body_a, body_b)`.

Constant Summary collapse

DEFAULT_THRESHOLD =
0.5
STOPWORDS =

Function words carry no episodic signal; dropping them focuses the overlap on subject/verb content.

%w[
  a an the to of in on at for and or but we i it is are was were be been
  this that these those with as by from into our your their its do does
].to_set.freeze

Instance Method Summary collapse

Constructor Details

#initialize(threshold: DEFAULT_THRESHOLD) ⇒ TokenOverlapMatcher

Returns a new instance of TokenOverlapMatcher.



31
32
33
# File 'lib/claude_memory/observe/token_overlap_matcher.rb', line 31

def initialize(threshold: DEFAULT_THRESHOLD)
  @threshold = threshold
end

Instance Method Details

#similar?(body_a, body_b) ⇒ Boolean

Returns true when the two bodies are near-duplicate sightings.

Returns:

  • (Boolean)

    true when the two bodies are near-duplicate sightings



36
37
38
39
40
41
42
43
44
# File 'lib/claude_memory/observe/token_overlap_matcher.rb', line 36

def similar?(body_a, body_b)
  a = significant_tokens(body_a)
  b = significant_tokens(body_b)
  return false if a.empty? || b.empty?

  intersection = (a & b).size.to_f
  union = (a | b).size
  (intersection / union) >= @threshold
end