Class: SkillBench::Agent::Runner
- Inherits:
-
Object
- Object
- SkillBench::Agent::Runner
- Defined in:
- lib/skill_bench/agent/runner.rb
Overview
Responsible for executing a specific scenario (baseline or context-hydrated) within an isolated sandbox. Handles the system prompt generation and agent execution.
Class Method Summary collapse
-
.call(params) ⇒ Array<String, String>
Executes the agent run scenario.
Instance Method Summary collapse
-
#call ⇒ Array<String, String>
Runs the evaluation scenario and captures the results.
-
#initialize(params) ⇒ Runner
constructor
A new instance of Runner.
Constructor Details
#initialize(params) ⇒ Runner
Returns a new instance of Runner.
27 28 29 30 31 32 33 34 35 |
# File 'lib/skill_bench/agent/runner.rb', line 27 def initialize(params) @mode = validate_mode(params.fetch(:mode)) @full_eval_path = params.fetch(:full_eval_path) @task_content = params.fetch(:task_content) @client_params = params.fetch(:client_params, {}) @source_path = params[:source_path] @base_path = params[:base_path] end |
Class Method Details
.call(params) ⇒ Array<String, String>
Executes the agent run scenario.
22 23 24 |
# File 'lib/skill_bench/agent/runner.rb', line 22 def self.call(params) new(params).call end |
Instance Method Details
#call ⇒ Array<String, String>
Runs the evaluation scenario and captures the results.
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/skill_bench/agent/runner.rb', line 40 def call Execution::Sandbox.run(@full_eval_path) do |sandbox| working_dir = sandbox.path agent_result = ReactAgent.call( client_params: @client_params, working_dir: working_dir, container_id: sandbox.container_id, system_prompt: build_system_prompt, initial_prompt: @task_content ) response = agent_result[:response] final_answer = if agent_result[:success] response&.dig(:content) || 'Error: Empty response from agent' else error_msg = response&.dig(:error, :message) || 'Unknown error' "Error: #{error_msg}" end [final_answer, Execution::Sandbox.capture_diff(working_dir)] end end |