Class: ClaudeMemory::Core::RRFusion

Inherits:
Object
  • Object
show all
Defined in:
lib/claude_memory/core/rr_fusion.rb

Overview

Reciprocal Rank Fusion (RRF) for merging ranked result lists Follows Functional Core pattern - no I/O, just transformations

RRF combines multiple ranked lists using position-based scoring:

score(d) = Σ(weight_r / (k + rank_r(d)))

This is more effective than naive deduplication because it considers rank positions from both sources, giving higher scores to results that appear near the top in multiple lists.

Constant Summary collapse

K =

Standard RRF constant - controls rank pressure

60
TOP_BONUS =
{1 => 0.05, 2 => 0.02, 3 => 0.02}.freeze

Class Method Summary collapse

Class Method Details

.fuse(vector_results, text_results, limit, vector_weight: 1.0, text_weight: 1.0, explain: false) ⇒ Array<Hash>

Fuse ranked lists from vector and text search

Parameters:

  • vector_results (Array<Hash>)

    Results from vector search (ordered by similarity)

  • text_results (Array<Hash>)

    Results from text search (ordered by FTS rank)

  • limit (Integer)

    Maximum results to return

  • vector_weight (Float) (defaults to: 1.0)

    Weight multiplier for vector rankings (default 1.0)

  • text_weight (Float) (defaults to: 1.0)

    Weight multiplier for text rankings (default 1.0)

Returns:

  • (Array<Hash>)

    Fused results sorted by RRF score, with :similarity set to RRF score



25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/claude_memory/core/rr_fusion.rb', line 25

def self.fuse(vector_results, text_results, limit, vector_weight: 1.0, text_weight: 1.0, explain: false)
  scores = {}
  traces = {} if explain
  fact_data = {}

  # Score vector results by rank position
  vector_results.each_with_index do |result, idx|
    fact_id = result[:fact][:id]
    rank = idx + 1 # 1-based rank
    contribution = (vector_weight / (K + rank)) + TOP_BONUS.fetch(rank, 0.0)
    scores[fact_id] = (scores[fact_id] || 0.0) + contribution
    if explain
      traces[fact_id] ||= {vec_rank: nil, vec_score: nil, fts_rank: nil, fts_score: nil, vec_rrf: nil, fts_rrf: nil}
      traces[fact_id][:vec_rank] = rank
      traces[fact_id][:vec_score] = result[:similarity]
      traces[fact_id][:vec_rrf] = contribution.round(6)
    end
    # Prefer vector result data (has real similarity score)
    fact_data[fact_id] = result
  end

  # Score text results by rank position
  text_results.each_with_index do |result, idx|
    fact_id = result[:fact][:id]
    rank = idx + 1
    contribution = (text_weight / (K + rank)) + TOP_BONUS.fetch(rank, 0.0)
    scores[fact_id] = (scores[fact_id] || 0.0) + contribution
    if explain
      traces[fact_id] ||= {vec_rank: nil, vec_score: nil, fts_rank: nil, fts_score: nil, vec_rrf: nil, fts_rrf: nil}
      traces[fact_id][:fts_rank] = rank
      traces[fact_id][:fts_score] = result[:similarity]
      traces[fact_id][:fts_rrf] = contribution.round(6)
    end
    # Only use text data if not already present from vector
    fact_data[fact_id] ||= result
  end

  # Sort by RRF score descending and return top results
  scores
    .sort_by { |_id, score| -score }
    .take(limit)
    .map do |fact_id, score|
      merged = fact_data[fact_id].merge(similarity: score)
      merged[:score_trace] = traces[fact_id].merge(rrf_final: score.round(6)) if explain
      merged
    end
end