Class: SkillBench::Commands::EvalNew
- Inherits:
-
Object
- Object
- SkillBench::Commands::EvalNew
- Defined in:
- lib/skill_bench/commands/eval_new.rb
Overview
Handles the ‘skill-bench eval new` command
Constant Summary collapse
- ALLOWED_RUNTIMES =
Allowed runtime values for eval scaffolding.
%w[ruby rails].freeze
Class Method Summary collapse
-
.create_criteria_json(path, runtime) ⇒ void
Create criteria.json for the eval.
-
.create_rails_files(path, _name) ⇒ void
Create Rails-specific files for the eval.
-
.create_task_md(path, name) ⇒ void
Create task.md for the eval.
-
.default_criteria(runtime) ⇒ Hash
Generate default criteria hash.
-
.run(name:, runtime: 'ruby') ⇒ void
Run the eval new command.
-
.task_template(name) ⇒ String
Generate task.md template.
Class Method Details
.create_criteria_json(path, runtime) ⇒ void
This method returns an undefined value.
Create criteria.json for the eval
41 42 43 44 |
# File 'lib/skill_bench/commands/eval_new.rb', line 41 def self.create_criteria_json(path, runtime) criteria = default_criteria(runtime) File.write(File.join(path, 'criteria.json'), JSON.pretty_generate(criteria)) end |
.create_rails_files(path, _name) ⇒ void
This method returns an undefined value.
Create Rails-specific files for the eval
84 85 86 |
# File 'lib/skill_bench/commands/eval_new.rb', line 84 def self.create_rails_files(path, _name) File.write(File.join(path, 'rails_helper.rb'), "require 'rails_helper'\n") end |
.create_task_md(path, name) ⇒ void
This method returns an undefined value.
Create task.md for the eval
33 34 35 |
# File 'lib/skill_bench/commands/eval_new.rb', line 33 def self.create_task_md(path, name) File.write(File.join(path, 'task.md'), task_template(name)) end |
.default_criteria(runtime) ⇒ Hash
Generate default criteria hash.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/skill_bench/commands/eval_new.rb', line 65 def self.default_criteria(runtime) { context: "Evaluate #{runtime} task", dimensions: [ { name: 'correctness', max_score: 30 }, { name: 'skill_adherence', max_score: 25 }, { name: 'code_quality', max_score: 20 }, { name: 'test_coverage', max_score: 15 }, { name: 'documentation', max_score: 10 } ], pass_threshold: 70, minimum_delta: 10 } end |
.run(name:, runtime: 'ruby') ⇒ void
This method returns an undefined value.
Run the eval new command
18 19 20 21 22 23 24 25 26 27 |
# File 'lib/skill_bench/commands/eval_new.rb', line 18 def self.run(name:, runtime: 'ruby') raise ArgumentError, "Unsupported runtime '#{runtime}'. Allowed: #{ALLOWED_RUNTIMES.join(', ')}" unless ALLOWED_RUNTIMES.include?(runtime) eval_path = File.join('evals', name) FileUtils.mkdir_p(eval_path) create_task_md(eval_path, name) create_criteria_json(eval_path, runtime) create_rails_files(eval_path, name) if runtime == 'rails' end |
.task_template(name) ⇒ String
Generate task.md template
49 50 51 52 53 54 55 56 57 58 59 |
# File 'lib/skill_bench/commands/eval_new.rb', line 49 def self.task_template(name) <<~MARKDOWN # Eval: #{name} ## Task Describe the task for the agent here. ## Success Criteria Define what constitutes a successful completion. MARKDOWN end |