Module: Woods::Evaluation::Metrics

Defined in:
lib/woods/evaluation/metrics.rb

Overview

Retrieval quality metrics.

All methods are stateless pure functions that take arrays of identifiers and return numeric scores.

Class Method Summary collapse

Class Method Details

.context_completeness(retrieved, required) ⇒ Float

Fraction of required units present in retrieved results.

Parameters:

  • retrieved (Array<String>)

    Retrieved identifiers

  • required (Array<String>)

    Required identifiers (subset of relevant)

Returns:

  • (Float)

    0.0 to 1.0



62
63
64
65
66
67
68
# File 'lib/woods/evaluation/metrics.rb', line 62

def context_completeness(retrieved, required)
  return 1.0 if required.empty?

  retrieved_set = retrieved.to_set
  found = required.count { |id| retrieved_set.include?(id) }
  found.to_f / required.size
end

.mrr(retrieved, relevant) ⇒ Float

Mean Reciprocal Rank — inverse of the rank of the first relevant result.

Parameters:

  • retrieved (Array<String>)

    Retrieved identifiers (ordered)

  • relevant (Array<String>)

    Ground-truth relevant identifiers

Returns:

  • (Float)

    0.0 to 1.0



49
50
51
52
53
54
55
# File 'lib/woods/evaluation/metrics.rb', line 49

def mrr(retrieved, relevant)
  relevant_set = relevant.to_set
  retrieved.each_with_index do |id, idx|
    return 1.0 / (idx + 1) if relevant_set.include?(id)
  end
  0.0
end

.precision_at_k(retrieved, relevant, cutoff: 5) ⇒ Float

Fraction of top-k results that are relevant.

Parameters:

  • retrieved (Array<String>)

    Retrieved unit identifiers (ordered)

  • relevant (Array<String>)

    Ground-truth relevant identifiers

  • cutoff (Integer) (defaults to: 5)

    Number of top results to consider

Returns:

  • (Float)

    0.0 to 1.0



19
20
21
22
23
24
25
26
27
28
29
# File 'lib/woods/evaluation/metrics.rb', line 19

def precision_at_k(retrieved, relevant, cutoff: 5)
  return 0.0 if retrieved.empty? || relevant.empty?

  top_k = retrieved.first(cutoff)
  relevant_set = relevant.to_set
  hits = top_k.count { |id| relevant_set.include?(id) }
  # Divide by actual slice size, not the cutoff — when fewer than
  # `cutoff` items are retrieved, dividing by `cutoff` understates
  # precision (returns 0.2 for 1-of-1 at cutoff=5 instead of 1.0).
  hits.to_f / top_k.size
end

.recall(retrieved, relevant) ⇒ Float

Fraction of relevant items that were retrieved.

Parameters:

  • retrieved (Array<String>)

    Retrieved identifiers

  • relevant (Array<String>)

    Ground-truth relevant identifiers

Returns:

  • (Float)

    0.0 to 1.0



36
37
38
39
40
41
42
# File 'lib/woods/evaluation/metrics.rb', line 36

def recall(retrieved, relevant)
  return 0.0 if relevant.empty?

  retrieved_set = retrieved.to_set
  found = relevant.count { |id| retrieved_set.include?(id) }
  found.to_f / relevant.size
end

.token_efficiency(relevant_tokens, total_tokens) ⇒ Float

Ratio of relevant tokens to total tokens in context.

Parameters:

  • relevant_tokens (Integer)

    Tokens from relevant units

  • total_tokens (Integer)

    Total tokens in assembled context

Returns:

  • (Float)

    0.0 to 1.0



75
76
77
78
79
# File 'lib/woods/evaluation/metrics.rb', line 75

def token_efficiency(relevant_tokens, total_tokens)
  return 0.0 if total_tokens.zero?

  [relevant_tokens.to_f / total_tokens, 1.0].min
end