Class: Woods::Evaluation::Evaluator

Inherits:
Object
  • Object
show all
Defined in:
lib/woods/evaluation/evaluator.rb

Overview

Runs evaluation queries through a Retriever and scores results against ground truth annotations.

Takes a configured retriever and a query set, runs each query, and produces per-query and aggregate metrics.

Examples:

evaluator = Evaluator.new(retriever: retriever, query_set: query_set)
report = evaluator.evaluate
report.aggregates[:mean_mrr]  # => 0.75

Defined Under Namespace

Classes: EvaluationReport, QueryResult

Constant Summary collapse

METRIC_KEYS =
%i[precision_at5 precision_at10 recall mrr context_completeness token_efficiency].freeze

Instance Method Summary collapse

Constructor Details

#initialize(retriever:, query_set:, budget: 8000) ⇒ Evaluator

Returns a new instance of Evaluator.

Parameters:

  • retriever (Woods::Retriever)

    Configured retriever instance

  • query_set (QuerySet)

    Set of evaluation queries with ground truth

  • budget (Integer) (defaults to: 8000)

    Token budget per query



31
32
33
34
35
# File 'lib/woods/evaluation/evaluator.rb', line 31

def initialize(retriever:, query_set:, budget: 8000)
  @retriever = retriever
  @query_set = query_set
  @budget = budget
end

Instance Method Details

#evaluateEvaluationReport

Run all queries and produce an evaluation report.

Returns:



40
41
42
43
44
# File 'lib/woods/evaluation/evaluator.rb', line 40

def evaluate
  results = @query_set.queries.map { |q| evaluate_query(q) }
  aggregates = compute_aggregates(results)
  EvaluationReport.new(results: results, aggregates: aggregates)
end