Class: Riffer::Evals::EvaluatorRunner
- Inherits:
-
Object
- Object
- Riffer::Evals::EvaluatorRunner
- Defined in:
- lib/riffer/evals/evaluator_runner.rb
Overview
Orchestrates running evaluators against an agent across multiple scenarios.
Accepts an agent class, a list of scenarios, and evaluator classes. Generates agent output for each scenario and runs all evaluators, returning a RunResult with per-scenario details and aggregate scores.
result = Riffer::Evals::EvaluatorRunner.run(
agent: MyAgent,
scenarios: [
{ input: "What is Ruby?", ground_truth: "A programming language" },
{ input: "What is Python?" }
],
evaluators: [AnswerRelevancyEvaluator]
)
result.scores # => { AnswerRelevancyEvaluator => 0.85 }
Class Method Summary collapse
-
.run(agent:, scenarios:, evaluators:, context: nil) ⇒ Object
Runs evaluators against an agent for the given scenarios.
Class Method Details
.run(agent:, scenarios:, evaluators:, context: nil) ⇒ Object
Runs evaluators against an agent for the given scenarios.
- agent
-
an Agent subclass (not an instance).
- scenarios
-
array of hashes with
:input, optional:ground_truth, and optional:context. - evaluators
-
array of Evaluator subclasses to run against each scenario.
- context
-
optional hash passed to
agent.generate. Per-scenario:contexttakes precedence.
Raises Riffer::ArgumentError if agent is not a Riffer::Agent subclass or any eval is not a Riffer::Evals::Evaluator subclass.
– : (agent: singleton(Riffer::Agent), scenarios: Array[Hash[Symbol, untyped]], evaluators: Array, ?context: Hash[Symbol, untyped]?) -> Riffer::Evals::RunResult
34 35 36 37 38 39 40 41 42 43 |
# File 'lib/riffer/evals/evaluator_runner.rb', line 34 def self.run(agent:, scenarios:, evaluators:, context: nil) validate_agent!(agent) validate_evaluators!(evaluators) scenario_results = scenarios.map do |scenario| run_scenario(agent: agent, scenario: scenario, evaluators: evaluators, context: context) end Riffer::Evals::RunResult.new(scenario_results: scenario_results) end |