Module: Riffer::Evals::EvaluatorRunner

Extended by:
EvaluatorRunner
Included in:
EvaluatorRunner
Defined in:
lib/riffer/evals/evaluator_runner.rb

Overview

Orchestrates running evaluators against an agent across multiple scenarios.

result = Riffer::Evals::EvaluatorRunner.run(
  agent: MyAgent,
  scenarios: [
    { input: "What is Ruby?", ground_truth: "A programming language" },
    { input: "What is Python?" }
  ],
  evaluators: [AnswerRelevancyEvaluator]
)

result.scores   # => { AnswerRelevancyEvaluator => 0.85 }

Instance Method Summary collapse

Instance Method Details

#run(agent:, scenarios:, evaluators:, context: nil) ⇒ Object

Runs evaluators against an agent for the given scenarios. Raises Riffer::ArgumentError on an invalid agent or evaluator. – : (agent: singleton(Riffer::Agent), scenarios: Array[Hash[Symbol, untyped]], evaluators: Array, ?context: Hash[Symbol, untyped]?) -> Riffer::Evals::RunResult



24
25
26
27
28
29
30
31
32
33
# File 'lib/riffer/evals/evaluator_runner.rb', line 24

def run(agent:, scenarios:, evaluators:, context: nil)
  validate_agent!(agent)
  validate_evaluators!(evaluators)

  scenario_results = scenarios.map do |scenario|
    run_scenario(agent: agent, scenario: scenario, evaluators: evaluators, context: context)
  end

  Riffer::Evals::RunResult.new(scenario_results: scenario_results)
end