Class: SkillBench::Commands::EvalNew

Inherits:

Object

Object
SkillBench::Commands::EvalNew

show all

Defined in:: lib/skill_bench/commands/eval_new.rb

Overview

Handles the ‘skill-bench eval new` command

Constant Summary collapse

ALLOWED_RUNTIMES = Allowed runtime values for eval scaffolding.

%w[ruby rails].freeze

Class Method Summary collapse

.create_criteria_json(path, runtime) ⇒ void

Create criteria.json for the eval.
.create_rails_files(path, _name) ⇒ void

Create Rails-specific files for the eval.
.create_task_md(path, name) ⇒ void

Create task.md for the eval.
.default_criteria(runtime) ⇒ Hash

Generate default criteria hash.
.run(name:, runtime: 'ruby') ⇒ void

Run the eval new command.
.task_template(name) ⇒ String

Generate task.md template.

Class Method Details

.create_criteria_json(path, runtime) ⇒ `void`

This method returns an undefined value.

Create criteria.json for the eval

Parameters:

path (String) —

Eval directory path
runtime (String) —

Runtime type

# File 'lib/skill_bench/commands/eval_new.rb', line 41

def self.create_criteria_json(path, runtime)
  criteria = default_criteria(runtime)
  File.write(File.join(path, 'criteria.json'), JSON.pretty_generate(criteria))
end

.create_rails_files(path, _name) ⇒ `void`

This method returns an undefined value.

Create Rails-specific files for the eval

Parameters:

path (String) —

Eval directory path
_name (String) —

Eval name



84
85
86

# File 'lib/skill_bench/commands/eval_new.rb', line 84

def self.create_rails_files(path, _name)
  File.write(File.join(path, 'rails_helper.rb'), "require 'rails_helper'\n")
end

.create_task_md(path, name) ⇒ `void`

This method returns an undefined value.

Create task.md for the eval

Parameters:

path (String) —

Eval directory path
name (String) —

Eval name



33
34
35

# File 'lib/skill_bench/commands/eval_new.rb', line 33

def self.create_task_md(path, name)
  File.write(File.join(path, 'task.md'), task_template(name))
end

.default_criteria(runtime) ⇒ `Hash`

Generate default criteria hash.

Parameters:

runtime (String) —

Runtime type.

Returns:

(Hash) —

Criteria configuration in the new format.

# File 'lib/skill_bench/commands/eval_new.rb', line 65

def self.default_criteria(runtime)
  {
    context: "Evaluate #{runtime} task",
    dimensions: [
      { name: 'correctness', max_score: 30 },
      { name: 'skill_adherence', max_score: 25 },
      { name: 'code_quality', max_score: 20 },
      { name: 'test_coverage', max_score: 15 },
      { name: 'documentation', max_score: 10 }
    ],
    pass_threshold: 70,
    minimum_delta: 10
  }
end

.run(name:, runtime: 'ruby') ⇒ `void`

This method returns an undefined value.

Run the eval new command

Parameters:

name (String) —

Eval name
runtime (String) (defaults to: 'ruby') —

“ruby” or “rails” (default: ruby)

Raises:

(ArgumentError) —

if runtime is not in ALLOWED_RUNTIMES.

# File 'lib/skill_bench/commands/eval_new.rb', line 18

def self.run(name:, runtime: 'ruby')
  raise ArgumentError, "Unsupported runtime '#{runtime}'. Allowed: #{ALLOWED_RUNTIMES.join(', ')}" unless ALLOWED_RUNTIMES.include?(runtime)

  eval_path = File.join('evals', name)
  FileUtils.mkdir_p(eval_path)

  create_task_md(eval_path, name)
  create_criteria_json(eval_path, runtime)
  create_rails_files(eval_path, name) if runtime == 'rails'
end

.task_template(name) ⇒ `String`

Generate task.md template

Parameters:

name (String) —

Eval name

Returns:

(String) —

Markdown template

# File 'lib/skill_bench/commands/eval_new.rb', line 49

def self.task_template(name)
  <<~MARKDOWN
    # Eval: #{name}

    ## Task
    Describe the task for the agent here.

    ## Success Criteria
    Define what constitutes a successful completion.
  MARKDOWN
end

Class: SkillBench::Commands::EvalNew

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.create_criteria_json(path, runtime) ⇒ void

.create_rails_files(path, _name) ⇒ void

.create_task_md(path, name) ⇒ void

.default_criteria(runtime) ⇒ Hash

.run(name:, runtime: 'ruby') ⇒ void

.task_template(name) ⇒ String

.create_criteria_json(path, runtime) ⇒ `void`

.create_rails_files(path, _name) ⇒ `void`

.create_task_md(path, name) ⇒ `void`

.default_criteria(runtime) ⇒ `Hash`

.run(name:, runtime: 'ruby') ⇒ `void`

.task_template(name) ⇒ `String`