Class: SkillBench::Evaluation::Generator
- Inherits:
-
Object
- Object
- SkillBench::Evaluation::Generator
- Defined in:
- lib/skill_bench/evaluation/generator.rb
Overview
Generates an eval (task.md + criteria.json) from a skill’s documentation.
Constant Summary collapse
- GENERATION_PROMPT =
Prompt template used to generate evals from skill documentation via LLM.
<<~PROMPT You are an evaluation designer for a skill-benchmarking tool. Given a skill's documentation, create an eval scenario that tests whether an AI agent can apply the skill correctly. Output ONLY a JSON object with this exact structure: { "task": "A detailed task description for the agent to perform. Be specific about what the agent should build or do.", "context": "A brief description of what this eval measures.", "dimensions": [ { "name": "correctness", "max_score": 30 }, { "name": "skill_adherence", "max_score": 25 }, { "name": "code_quality", "max_score": 20 }, { "name": "test_coverage", "max_score": 15 }, { "name": "documentation", "max_score": 10 } ], "pass_threshold": 70, "minimum_delta": 10 } Rules: - dimension max_scores MUST sum to exactly 100 - pass_threshold should be between 60 and 80 - minimum_delta should be between 5 and 15 - task should be specific enough that an agent can attempt it in under 5 minutes - the eval should test whether the agent follows the patterns from the skill Skill documentation: PROMPT
Instance Method Summary collapse
-
#call ⇒ Hash
Generates the eval files.
-
#initialize(skill_name:, eval_name:) ⇒ Generator
constructor
A new instance of Generator.
Constructor Details
#initialize(skill_name:, eval_name:) ⇒ Generator
Returns a new instance of Generator.
47 48 49 50 |
# File 'lib/skill_bench/evaluation/generator.rb', line 47 def initialize(skill_name:, eval_name:) @skill_name = skill_name @eval_name = eval_name end |
Instance Method Details
#call ⇒ Hash
Generates the eval files.
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/skill_bench/evaluation/generator.rb', line 55 def call sanitized = sanitize_eval_name(eval_name) return invalid_name_result unless sanitized skill = resolve_skill return skill_not_found_result unless skill skill_content = read_skill_content(skill.path) generated = generate_eval(skill_content) return generated unless generated[:success] write_eval_files(sanitized, generated[:response][:data]) criteria_path = File.join('evals', sanitized, 'criteria.json') validation = SkillBench::Models::CriteriaValidator.call(path: criteria_path) unless validation[:success] FileUtils.rm_rf(File.join('evals', sanitized)) return validation end { success: true, response: { eval_path: "evals/#{sanitized}" } } rescue StandardError => e SkillBench::ErrorLogger.log_error(e, 'Evaluation::Generator Error') { success: false, response: { error: { message: e. } } } end |