Class: SkillBench::Evaluation::Runner
- Inherits:
-
Object
- Object
- SkillBench::Evaluation::Runner
- Defined in:
- lib/skill_bench/evaluation/runner.rb
Overview
Orchestrates the evaluation pipeline.
Coordinates blind judging of baseline and context agent outputs, then computes deltas and determines the final verdict.
Class Method Summary collapse
-
.call(task:, criteria:, skill_context:, baseline_output:, context_output:, judge_params: {}) ⇒ Hash
Runs the evaluation pipeline.
Instance Method Summary collapse
-
#call ⇒ Hash
Orchestrates judging and delta computation.
-
#initialize(task:, criteria:, skill_context:, baseline_output:, context_output:, judge_params: {}) ⇒ Runner
constructor
A new instance of Runner.
Constructor Details
#initialize(task:, criteria:, skill_context:, baseline_output:, context_output:, judge_params: {}) ⇒ Runner
Returns a new instance of Runner.
29 30 31 32 33 34 35 36 |
# File 'lib/skill_bench/evaluation/runner.rb', line 29 def initialize(task:, criteria:, skill_context:, baseline_output:, context_output:, judge_params: {}) @task = task @criteria = criteria @skill_context = skill_context @baseline_output = baseline_output @context_output = context_output @judge_params = judge_params.is_a?(Hash) ? judge_params : {} end |
Class Method Details
.call(task:, criteria:, skill_context:, baseline_output:, context_output:, judge_params: {}) ⇒ Hash
Runs the evaluation pipeline.
19 20 21 |
# File 'lib/skill_bench/evaluation/runner.rb', line 19 def self.call(task:, criteria:, skill_context:, baseline_output:, context_output:, judge_params: {}) new(task:, criteria:, skill_context:, baseline_output:, context_output:, judge_params:).call end |
Instance Method Details
#call ⇒ Hash
Orchestrates judging and delta computation.
41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/skill_bench/evaluation/runner.rb', line 41 def call baseline_judge = judge_run(baseline_output, nil) return baseline_judge unless baseline_judge[:success] context_judge = judge_run(context_output, skill_context) return context_judge unless context_judge[:success] compute_deltas(baseline_judge, context_judge) rescue StandardError => e SkillBench::ErrorLogger.log_error(e, 'Evaluation::Runner Error') { success: false, response: { error: { message: e. } } } end |