Class: SkillBench::Commands::EvalNew

Inherits:
Object
  • Object
show all
Defined in:
lib/skill_bench/commands/eval_new.rb

Overview

Handles the ‘skill-bench eval new` command

Constant Summary collapse

ALLOWED_RUNTIMES =

Allowed runtime values for eval scaffolding.

%w[ruby rails].freeze

Class Method Summary collapse

Class Method Details

.create_criteria_json(path, runtime) ⇒ void

This method returns an undefined value.

Create criteria.json for the eval

Parameters:

  • path (String)

    Eval directory path

  • runtime (String)

    Runtime type



41
42
43
44
# File 'lib/skill_bench/commands/eval_new.rb', line 41

def self.create_criteria_json(path, runtime)
  criteria = default_criteria(runtime)
  File.write(File.join(path, 'criteria.json'), JSON.pretty_generate(criteria))
end

.create_rails_files(path, _name) ⇒ void

This method returns an undefined value.

Create Rails-specific files for the eval

Parameters:

  • path (String)

    Eval directory path

  • _name (String)

    Eval name



84
85
86
# File 'lib/skill_bench/commands/eval_new.rb', line 84

def self.create_rails_files(path, _name)
  File.write(File.join(path, 'rails_helper.rb'), "require 'rails_helper'\n")
end

.create_task_md(path, name) ⇒ void

This method returns an undefined value.

Create task.md for the eval

Parameters:

  • path (String)

    Eval directory path

  • name (String)

    Eval name



33
34
35
# File 'lib/skill_bench/commands/eval_new.rb', line 33

def self.create_task_md(path, name)
  File.write(File.join(path, 'task.md'), task_template(name))
end

.default_criteria(runtime) ⇒ Hash

Generate default criteria hash.

Parameters:

  • runtime (String)

    Runtime type.

Returns:

  • (Hash)

    Criteria configuration in the new format.



65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/skill_bench/commands/eval_new.rb', line 65

def self.default_criteria(runtime)
  {
    context: "Evaluate #{runtime} task",
    dimensions: [
      { name: 'correctness', max_score: 30 },
      { name: 'skill_adherence', max_score: 25 },
      { name: 'code_quality', max_score: 20 },
      { name: 'test_coverage', max_score: 15 },
      { name: 'documentation', max_score: 10 }
    ],
    pass_threshold: 70,
    minimum_delta: 10
  }
end

.run(name:, runtime: 'ruby') ⇒ void

This method returns an undefined value.

Run the eval new command

Parameters:

  • name (String)

    Eval name

  • runtime (String) (defaults to: 'ruby')

    “ruby” or “rails” (default: ruby)

Raises:

  • (ArgumentError)

    if runtime is not in ALLOWED_RUNTIMES.



18
19
20
21
22
23
24
25
26
27
# File 'lib/skill_bench/commands/eval_new.rb', line 18

def self.run(name:, runtime: 'ruby')
  raise ArgumentError, "Unsupported runtime '#{runtime}'. Allowed: #{ALLOWED_RUNTIMES.join(', ')}" unless ALLOWED_RUNTIMES.include?(runtime)

  eval_path = File.join('evals', name)
  FileUtils.mkdir_p(eval_path)

  create_task_md(eval_path, name)
  create_criteria_json(eval_path, runtime)
  create_rails_files(eval_path, name) if runtime == 'rails'
end

.task_template(name) ⇒ String

Generate task.md template

Parameters:

  • name (String)

    Eval name

Returns:

  • (String)

    Markdown template



49
50
51
52
53
54
55
56
57
58
59
# File 'lib/skill_bench/commands/eval_new.rb', line 49

def self.task_template(name)
  <<~MARKDOWN
    # Eval: #{name}

    ## Task
    Describe the task for the agent here.

    ## Success Criteria
    Define what constitutes a successful completion.
  MARKDOWN
end