Class: SkillBench::Services::RunnerService
- Inherits:
-
Object
- Object
- SkillBench::Services::RunnerService
- Defined in:
- lib/skill_bench/services/runner_service.rb
Overview
Orchestrates the execution of an eval with baseline and context runs. Coordinates multiple services to resolve entities, spawn agents, and evaluate results.
Defined Under Namespace
Classes: EvaluationContext
Class Method Summary collapse
-
.call(eval_name:, skill_names:, pack: nil, registry_manifest: nil) ⇒ Hash
Runs an eval with the given parameters.
Instance Method Summary collapse
-
#call ⇒ Hash
Executes the eval: resolves entities, runs baseline and context, evaluates.
-
#initialize(eval_name:, skill_names:, pack: nil, registry_manifest: nil) ⇒ RunnerService
constructor
A new instance of RunnerService.
Constructor Details
#initialize(eval_name:, skill_names:, pack: nil, registry_manifest: nil) ⇒ RunnerService
Returns a new instance of RunnerService.
42 43 44 45 46 47 |
# File 'lib/skill_bench/services/runner_service.rb', line 42 def initialize(eval_name:, skill_names:, pack: nil, registry_manifest: nil) @eval_name = eval_name @skill_names = skill_names @pack = pack @registry_manifest = registry_manifest end |
Class Method Details
.call(eval_name:, skill_names:, pack: nil, registry_manifest: nil) ⇒ Hash
Runs an eval with the given parameters.
29 30 31 32 33 34 35 36 |
# File 'lib/skill_bench/services/runner_service.rb', line 29 def self.call(eval_name:, skill_names:, pack: nil, registry_manifest: nil) new( eval_name: eval_name, skill_names: skill_names, pack: pack, registry_manifest: registry_manifest ).call end |
Instance Method Details
#call ⇒ Hash
Executes the eval: resolves entities, runs baseline and context, evaluates.
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/skill_bench/services/runner_service.rb', line 54 def call evaluation = EvalResolver.call(eval_name) skills = SkillResolverService.call(skill_names, pack: pack, registry_manifest: registry_manifest) provider_result = ProviderResolver.call return config_error_result(provider_result[:error], evaluation, provider_result[:provider]) unless provider_result[:success] provider = provider_result[:provider] config = provider_result[:config] baseline_output = run_baseline_agent(evaluation, provider, config) return agent_error_result(baseline_output, 'baseline', evaluation, provider) if baseline_output[:status] == :error skill_context = ContextLoaderService.call(skills) return empty_context_error_result(evaluation, provider) if skill_context.strip.empty? context_output = run_context_agent(evaluation, skills, skill_context, provider, config) return agent_error_result(context_output, 'context', evaluation, provider) if context_output[:status] == :error context = EvaluationContext.new( evaluation: evaluation, skill_context: skill_context, baseline_output: baseline_output, context_output: context_output, provider: provider, config: config ) evaluate_and_record_trend(context) end |