Class: Riffer::Evals::EvaluatorRunner

Inherits:
Object
  • Object
show all
Defined in:
lib/riffer/evals/evaluator_runner.rb

Overview

Orchestrates running evaluators against an agent across multiple scenarios.

Accepts an agent class, a list of scenarios, and evaluator classes. Generates agent output for each scenario and runs all evaluators, returning a RunResult with per-scenario details and aggregate scores.

result = Riffer::Evals::EvaluatorRunner.run(
  agent: MyAgent,
  scenarios: [
    { input: "What is Ruby?", ground_truth: "A programming language" },
    { input: "What is Python?" }
  ],
  evaluators: [AnswerRelevancyEvaluator]
)

result.scores   # => { AnswerRelevancyEvaluator => 0.85 }

Class Method Summary collapse

Class Method Details

.run(agent:, scenarios:, evaluators:, context: nil) ⇒ Object

Runs evaluators against an agent for the given scenarios.

agent

an Agent subclass (not an instance).

scenarios

array of hashes with :input, optional :ground_truth, and optional :context.

evaluators

array of Evaluator subclasses to run against each scenario.

context

optional hash passed to agent.generate. Per-scenario :context takes precedence.

Raises Riffer::ArgumentError if agent is not a Riffer::Agent subclass or any eval is not a Riffer::Evals::Evaluator subclass.

– : (agent: singleton(Riffer::Agent), scenarios: Array[Hash[Symbol, untyped]], evaluators: Array, ?context: Hash[Symbol, untyped]?) -> Riffer::Evals::RunResult



34
35
36
37
38
39
40
41
42
43
# File 'lib/riffer/evals/evaluator_runner.rb', line 34

def self.run(agent:, scenarios:, evaluators:, context: nil)
  validate_agent!(agent)
  validate_evaluators!(evaluators)

  scenario_results = scenarios.map do |scenario|
    run_scenario(agent: agent, scenario: scenario, evaluators: evaluators, context: context)
  end

  Riffer::Evals::RunResult.new(scenario_results: scenario_results)
end