Class: SkillBench::Evaluation::Generator

Inherits:
Object
  • Object
show all
Defined in:
lib/skill_bench/evaluation/generator.rb

Overview

Generates an eval (task.md + criteria.json) from a skill’s documentation.

Constant Summary collapse

GENERATION_PROMPT =

Prompt template used to generate evals from skill documentation via LLM.

<<~PROMPT
  You are an evaluation designer for a skill-benchmarking tool.

  Given a skill's documentation, create an eval scenario that tests whether an AI agent
  can apply the skill correctly. Output ONLY a JSON object with this exact structure:

  {
    "task": "A detailed task description for the agent to perform. Be specific about what the agent should build or do.",
    "context": "A brief description of what this eval measures.",
    "dimensions": [
      { "name": "correctness", "max_score": 30 },
      { "name": "skill_adherence", "max_score": 25 },
      { "name": "code_quality", "max_score": 20 },
      { "name": "test_coverage", "max_score": 15 },
      { "name": "documentation", "max_score": 10 }
    ],
    "pass_threshold": 70,
    "minimum_delta": 10
  }

  Rules:
  - dimension max_scores MUST sum to exactly 100
  - pass_threshold should be between 60 and 80
  - minimum_delta should be between 5 and 15
  - task should be specific enough that an agent can attempt it in under 5 minutes
  - the eval should test whether the agent follows the patterns from the skill

  Skill documentation:
PROMPT

Instance Method Summary collapse

Constructor Details

#initialize(skill_name:, eval_name:) ⇒ Generator

Returns a new instance of Generator.

Parameters:

  • skill_name (String)

    Name of the skill to base the eval on.

  • eval_name (String)

    Name for the new eval directory.



47
48
49
50
# File 'lib/skill_bench/evaluation/generator.rb', line 47

def initialize(skill_name:, eval_name:)
  @skill_name = skill_name
  @eval_name = eval_name
end

Instance Method Details

#callHash

Generates the eval files.

Returns:

  • (Hash)

    Service response.



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/skill_bench/evaluation/generator.rb', line 55

def call
  sanitized = sanitize_eval_name(eval_name)
  return invalid_name_result unless sanitized

  skill = resolve_skill
  return skill_not_found_result unless skill

  skill_content = read_skill_content(skill.path)
  generated = generate_eval(skill_content)
  return generated unless generated[:success]

  write_eval_files(sanitized, generated[:response][:data])

  criteria_path = File.join('evals', sanitized, 'criteria.json')
  validation = SkillBench::Models::CriteriaValidator.call(path: criteria_path)
  unless validation[:success]
    FileUtils.rm_rf(File.join('evals', sanitized))
    return validation
  end

  { success: true, response: { eval_path: "evals/#{sanitized}" } }
rescue StandardError => e
  SkillBench::ErrorLogger.log_error(e, 'Evaluation::Generator Error')
  { success: false, response: { error: { message: e.message } } }
end