Class: SkillBench::Tools::RunCommand

Inherits:
Object
  • Object
show all
Defined in:
lib/skill_bench/tools/run_command.rb

Overview

Handles executing a shell command within the working directory.

Constant Summary collapse

DANGEROUS_COMMANDS =

Commands that are always blocked even if listed in allowed_commands, because they can be used to escape the sandbox or execute arbitrary code.

%w[
  bash sh zsh fish dash ksh csh tcsh
  python python3 python2 ruby perl node
  php lua tcl wish
  curl wget nc ncat socat
  eval exec
  sudo su doas
  chmod chown mount umount
  dd mkfs fdisk parted
  insmod rmmod modprobe
  systemctl service
  passwd useradd userdel groupadd groupdel
].freeze

Class Method Summary collapse

Class Method Details

.call(command, working_dir_path, container_id = nil) ⇒ String

Executes a shell command within the working directory (host or container).

Tokenizes the command string before execution so that arguments are passed directly to the OS without shell interpretation, preventing shell injection.

Parameters:

  • command (String)

    The command to run (e.g. “rspec spec/models”).

  • working_dir_path (Pathname)

    The host directory (ignored if container_id present).

  • container_id (String, nil) (defaults to: nil)

    The Docker container ID for isolated execution.

Returns:

  • (String)

    A formatted string containing the exit status, STDOUT, and STDERR.

Raises:

  • (Timeout::Error)

    Internally rescued; returns a timeout message string.



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# File 'lib/skill_bench/tools/run_command.rb', line 57

def self.call(command, working_dir_path, container_id = nil)
  argv = command.shellsplit
  return 'Error: Empty command.' if argv.empty?

  base_cmd = argv.first
  return "Error: Command '#{base_cmd}' is blocked for security reasons." if DANGEROUS_COMMANDS.include?(base_cmd)

  allowed = SkillBench::Config.allowed_commands
  return 'Error: No allowed commands configured. Set allowed_commands in skill-bench.json or use --mode mock.' if allowed.nil?
  return "Error: Command '#{base_cmd}' is not permitted." unless allowed.include?(base_cmd)

  max_time = SkillBench::Config.max_execution_time
  Timeout.timeout(max_time) do
    stdout_str, stderr_str, status = if container_id
                                       docker_cmd = ['docker', 'exec', '-w', '/sandbox', container_id] + argv
                                       Open3.capture3(*docker_cmd)
                                     else
                                       Open3.capture3(*argv, chdir: working_dir_path.to_s)
                                     end
    <<~RESULT
      Exit Status: #{status.exitstatus}
      STDOUT:
      #{stdout_str}
      STDERR:
      #{stderr_str}
    RESULT
  end
rescue Timeout::Error
  "Error: Command execution timed out after #{max_time} seconds."
end

.definitionHash

Returns The tool definition for the LLM API.

Returns:

  • (Hash)

    The tool definition for the LLM API.



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/skill_bench/tools/run_command.rb', line 29

def self.definition
  {
    type: 'function',
    function: {
      name: 'run_command',
      description: 'Execute a shell command (e.g., rspec).',
      parameters: {
        type: 'object',
        properties: {
          command: { type: 'string', description: 'The shell command to run.' }
        },
        required: ['command'],
        additionalProperties: false
      }
    }
  }
end