Class: SkillBench::Evaluation::Generator

Inherits:

Object

Object
SkillBench::Evaluation::Generator

show all

Defined in:: lib/skill_bench/evaluation/generator.rb

Overview

Generates an eval (task.md + criteria.json) from a skill’s documentation.

Constant Summary collapse

GENERATION_PROMPT = Prompt template used to generate evals from skill documentation via LLM.

<<~PROMPT
  You are an evaluation designer for a skill-benchmarking tool.

  Given a skill's documentation, create an eval scenario that tests whether an AI agent
  can apply the skill correctly. Output ONLY a JSON object with this exact structure:

  {
    "task": "A detailed task description for the agent to perform. Be specific about what the agent should build or do.",
    "context": "A brief description of what this eval measures.",
    "dimensions": [
      { "name": "correctness", "max_score": 30 },
      { "name": "skill_adherence", "max_score": 25 },
      { "name": "code_quality", "max_score": 20 },
      { "name": "test_coverage", "max_score": 15 },
      { "name": "documentation", "max_score": 10 }
    ],
    "pass_threshold": 70,
    "minimum_delta": 10
  }

  Rules:
  - dimension max_scores MUST sum to exactly 100
  - pass_threshold should be between 60 and 80
  - minimum_delta should be between 5 and 15
  - task should be specific enough that an agent can attempt it in under 5 minutes
  - the eval should test whether the agent follows the patterns from the skill

  Skill documentation:
PROMPT

Instance Method Summary collapse

#call ⇒ Hash

Generates the eval files.
#initialize(skill_name:, eval_name:) ⇒ Generator constructor

A new instance of Generator.

Constructor Details

#initialize(skill_name:, eval_name:) ⇒ `Generator`

Returns a new instance of Generator.

Parameters:

skill_name (String) —

Name of the skill to base the eval on.
eval_name (String) —

Name for the new eval directory.

# File 'lib/skill_bench/evaluation/generator.rb', line 47

def initialize(skill_name:, eval_name:)
  @skill_name = skill_name
  @eval_name = eval_name
end

Instance Method Details

#call ⇒ `Hash`

Generates the eval files.

Returns:

(Hash) —

Service response.

# File 'lib/skill_bench/evaluation/generator.rb', line 55

def call
  sanitized = sanitize_eval_name(eval_name)
  return invalid_name_result unless sanitized

  skill = resolve_skill
  return skill_not_found_result unless skill

  skill_content = read_skill_content(skill.path)
  generated = generate_eval(skill_content)
  return generated unless generated[:success]

  write_eval_files(sanitized, generated[:response][:data])

  criteria_path = File.join('evals', sanitized, 'criteria.json')
  validation = SkillBench::Models::CriteriaValidator.call(path: criteria_path)
  unless validation[:success]
    FileUtils.rm_rf(File.join('evals', sanitized))
    return validation
  end

  { success: true, response: { eval_path: "evals/#{sanitized}" } }
rescue StandardError => e
  SkillBench::ErrorLogger.log_error(e, 'Evaluation::Generator Error')
  { success: false, response: { error: { message: e.message } } }
end