Module: RobotLab::Convergence
- Defined in:
- lib/robot_lab/convergence.rb
Overview
TF-IDF cosine similarity utilities for detecting semantic convergence between two texts.
Common use cases:
-
Checking whether two independent verifiers have reached the same conclusion
-
Skipping a reconciler LLM call when verifiers already agree (fast-path)
-
Detecting when a multi-robot debate has converged on a consensus
Requires the optional ‘classifier’ gem (~> 2.3).
Constant Summary collapse
- DEFAULT_THRESHOLD =
Default cosine similarity threshold above which texts are convergent.
0.85- MIN_TEXT_LENGTH =
Minimum text length (characters) for meaningful TF-IDF scoring. Texts shorter than this always return 0.0 similarity.
30
Class Method Summary collapse
-
.detected?(text_a, text_b, threshold: DEFAULT_THRESHOLD) ⇒ Boolean
Determine whether two texts are semantically convergent.
-
.similarity(text_a, text_b) ⇒ Float
Compute cosine similarity between two texts using stemmed term frequencies.
Class Method Details
.detected?(text_a, text_b, threshold: DEFAULT_THRESHOLD) ⇒ Boolean
Determine whether two texts are semantically convergent.
38 39 40 41 42 43 44 |
# File 'lib/robot_lab/convergence.rb', line 38 def self.detected?(text_a, text_b, threshold: DEFAULT_THRESHOLD) unless (0.0..1.0).cover?(threshold) raise ArgumentError, "threshold must be in [0.0, 1.0], got #{threshold}" end similarity(text_a, text_b) >= threshold end |
.similarity(text_a, text_b) ⇒ Float
Compute cosine similarity between two texts using stemmed term frequencies.
Uses String#word_hash (provided by the classifier gem) to build stemmed, stopword-filtered term-frequency vectors, then computes L2-normalized cosine similarity. Term frequencies (no IDF) are used because IDF on a 2-document corpus collapses shared terms to zero, which would incorrectly penalize texts that agree on the same topic.
Returns 0.0 when either text is blank or shorter than MIN_TEXT_LENGTH.
60 61 62 63 64 65 66 67 |
# File 'lib/robot_lab/convergence.rb', line 60 def self.similarity(text_a, text_b) a = text_a.to_s.strip b = text_b.to_s.strip return 0.0 if a.length < MIN_TEXT_LENGTH || b.length < MIN_TEXT_LENGTH TextAnalysis.tf_cosine_similarity(a, b) end |