Class: SkillBench::Services::RunnerService

Inherits:
Object
  • Object
show all
Defined in:
lib/skill_bench/services/runner_service.rb

Overview

Orchestrates the execution of an eval with baseline and context runs. Coordinates multiple services to resolve entities, spawn agents, and evaluate results.

Defined Under Namespace

Classes: EvaluationContext

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(eval_name:, skill_names:, pack: nil, registry_manifest: nil) ⇒ RunnerService

Returns a new instance of RunnerService.

Parameters:

  • eval_name (String)

    Name or path of the eval

  • skill_names (Array<String>)

    Names of the skills

  • pack (String, nil) (defaults to: nil)

    Optional pack name

  • registry_manifest (String, nil) (defaults to: nil)

    Optional registry.json path



42
43
44
45
46
47
# File 'lib/skill_bench/services/runner_service.rb', line 42

def initialize(eval_name:, skill_names:, pack: nil, registry_manifest: nil)
  @eval_name = eval_name
  @skill_names = skill_names
  @pack = pack
  @registry_manifest = registry_manifest
end

Class Method Details

.call(eval_name:, skill_names:, pack: nil, registry_manifest: nil) ⇒ Hash

Runs an eval with the given parameters.

Parameters:

  • eval_name (String)

    Name or path of the eval to run

  • skill_names (Array<String>)

    Names of the skills to use

  • pack (String, nil) (defaults to: nil)

    Optional pack name for registry-based skill resolution

  • registry_manifest (String, nil) (defaults to: nil)

    Optional path to registry.json manifest

Returns:

  • (Hash)

    Result from EvaluationRunner



29
30
31
32
33
34
35
36
# File 'lib/skill_bench/services/runner_service.rb', line 29

def self.call(eval_name:, skill_names:, pack: nil, registry_manifest: nil)
  new(
    eval_name: eval_name,
    skill_names: skill_names,
    pack: pack,
    registry_manifest: registry_manifest
  ).call
end

Instance Method Details

#callHash

Executes the eval: resolves entities, runs baseline and context, evaluates.

Returns:

  • (Hash)

    Evaluation result with deltas and verdict.

Raises:

  • (Errno::ENOENT)

    when the eval directory does not exist.

  • (ArgumentError)

    when a skill cannot be resolved.



54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/skill_bench/services/runner_service.rb', line 54

def call
  evaluation = EvalResolver.call(eval_name)
  skills = SkillResolverService.call(skill_names, pack: pack, registry_manifest: registry_manifest)
  provider_result = ProviderResolver.call

  return config_error_result(provider_result[:error], evaluation, provider_result[:provider]) unless provider_result[:success]

  provider = provider_result[:provider]
  config = provider_result[:config]

  baseline_output = run_baseline_agent(evaluation, provider, config)
  return agent_error_result(baseline_output, 'baseline', evaluation, provider) if baseline_output[:status] == :error

  skill_context = ContextLoaderService.call(skills)
  return empty_context_error_result(evaluation, provider) if skill_context.strip.empty?

  context_output = run_context_agent(evaluation, skills, skill_context, provider, config)
  return agent_error_result(context_output, 'context', evaluation, provider) if context_output[:status] == :error

  context = EvaluationContext.new(
    evaluation: evaluation,
    skill_context: skill_context,
    baseline_output: baseline_output,
    context_output: context_output,
    provider: provider,
    config: config
  )
  evaluate_and_record_trend(context)
end