Class: SkillBench::Agent::Runner

Inherits:
Object
  • Object
show all
Defined in:
lib/skill_bench/agent/runner.rb

Overview

Responsible for executing a specific scenario (baseline or context-hydrated) within an isolated sandbox. Handles the system prompt generation and agent execution.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(params) ⇒ Runner

Returns a new instance of Runner.

Parameters:

  • params (Hash)

    The configuration parameters for the run.



27
28
29
30
31
32
33
34
35
# File 'lib/skill_bench/agent/runner.rb', line 27

def initialize(params)
  @mode = validate_mode(params.fetch(:mode))
  @full_eval_path = params.fetch(:full_eval_path)
  @task_content = params.fetch(:task_content)
  @client_params = params.fetch(:client_params, {})

  @source_path = params[:source_path]
  @base_path = params[:base_path]
end

Class Method Details

.call(params) ⇒ Array<String, String>

Executes the agent run scenario.

Parameters:

  • params (Hash)

    The configuration parameters for the run.

Options Hash (params):

  • :mode (Symbol)

    The mode to run in (‘:baseline` or `:context`).

  • :full_eval_path (Pathname)

    The path to the evaluation directory.

  • :task_content (String)

    The task description.

  • :client_params (Hash)

    Parameters for the LLM client.

  • :source_path (String)

    Required if mode is ‘:context`.

  • :base_path (Pathname)

    Required if mode is ‘:context`.

Returns:

  • (Array<String, String>)

    The agent’s final answer and the git diff.



22
23
24
# File 'lib/skill_bench/agent/runner.rb', line 22

def self.call(params)
  new(params).call
end

Instance Method Details

#callArray<String, String>

Runs the evaluation scenario and captures the results.

Returns:

  • (Array<String, String>)

    A tuple containing the final answer and the diff.



40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/skill_bench/agent/runner.rb', line 40

def call
  Execution::Sandbox.run(@full_eval_path) do |sandbox|
    working_dir = sandbox.path
    agent_result = ReactAgent.call(
      client_params: @client_params,
      working_dir: working_dir,
      container_id: sandbox.container_id,
      system_prompt: build_system_prompt,
      initial_prompt: @task_content
    )

    response = agent_result[:response]
    final_answer = if agent_result[:success]
                     response&.dig(:content) || 'Error: Empty response from agent'
                   else
                     error_msg = response&.dig(:error, :message) || 'Unknown error'
                     "Error: #{error_msg}"
                   end
    [final_answer, Execution::Sandbox.capture_diff(working_dir)]
  end
end