Class: SkillBench::Judge::Judge
- Inherits:
-
Object
- Object
- SkillBench::Judge::Judge
- Defined in:
- lib/skill_bench/judge/judge.rb
Overview
Responsible for evaluating AI-generated code modifications.
Accepts a structured judge prompt, calls the LLM client, and parses the response into a Judge::Response with per-dimension scores.
Constant Summary collapse
- SYSTEM_PROMPT =
System prompt sent to the LLM judge defining its role and output format.
'You are an objective judge evaluating AI coding models. ' \ 'Your goal is to score responses based strictly on the provided criteria. ' \ 'Return only valid JSON.'
Class Method Summary collapse
-
.call(prompt:, client_params: {}) ⇒ Hash
Evaluates agent output via the LLM judge.
Instance Method Summary collapse
-
#call ⇒ Hash
Executes the evaluation process via the LLM client.
-
#initialize(prompt:, client_params:) ⇒ Judge
constructor
A new instance of Judge.
Constructor Details
#initialize(prompt:, client_params:) ⇒ Judge
Returns a new instance of Judge.
29 30 31 32 |
# File 'lib/skill_bench/judge/judge.rb', line 29 def initialize(prompt:, client_params:) @prompt = prompt @client_params = client_params end |
Class Method Details
.call(prompt:, client_params: {}) ⇒ Hash
Evaluates agent output via the LLM judge.
23 24 25 |
# File 'lib/skill_bench/judge/judge.rb', line 23 def self.call(prompt:, client_params: {}) new(prompt:, client_params:).call end |
Instance Method Details
#call ⇒ Hash
Executes the evaluation process via the LLM client.
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/skill_bench/judge/judge.rb', line 37 def call judge_result = Client.call( system_prompt: SYSTEM_PROMPT, messages: [{ role: 'user', content: prompt }], **client_params ) return judge_result unless judge_result[:success] content = extract_content(judge_result) return empty_response_result unless content Response.call(json: content) rescue StandardError => e SkillBench::ErrorLogger.log_error(e, 'Judge Evaluation Error') { success: false, response: { error: { message: e. } } } end |