Class: SkillBench::Judge::Judge

Inherits:
Object
  • Object
show all
Defined in:
lib/skill_bench/judge/judge.rb

Overview

Responsible for evaluating AI-generated code modifications.

Accepts a structured judge prompt, calls the LLM client, and parses the response into a Judge::Response with per-dimension scores.

Constant Summary collapse

SYSTEM_PROMPT =

System prompt sent to the LLM judge defining its role and output format.

'You are an objective judge evaluating AI coding models. ' \
'Your goal is to score responses based strictly on the provided criteria. ' \
'Return only valid JSON.'

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(prompt:, client_params:) ⇒ Judge

Returns a new instance of Judge.

Parameters:

  • prompt (String)

    The structured judge prompt.

  • client_params (Hash)

    Optional client parameters.



29
30
31
32
# File 'lib/skill_bench/judge/judge.rb', line 29

def initialize(prompt:, client_params:)
  @prompt = prompt
  @client_params = client_params
end

Class Method Details

.call(prompt:, client_params: {}) ⇒ Hash

Evaluates agent output via the LLM judge.

Parameters:

  • prompt (String)

    The structured judge prompt.

  • client_params (Hash) (defaults to: {})

    Optional parameters to pass to the client.

Returns:

  • (Hash)

    with :success [Boolean] and :response containing Judge::Response or error.



23
24
25
# File 'lib/skill_bench/judge/judge.rb', line 23

def self.call(prompt:, client_params: {})
  new(prompt:, client_params:).call
end

Instance Method Details

#callHash

Executes the evaluation process via the LLM client.

Returns:

  • (Hash)

    Service response with Judge::Response or error.



37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/skill_bench/judge/judge.rb', line 37

def call
  judge_result = Client.call(
    system_prompt: SYSTEM_PROMPT,
    messages: [{ role: 'user', content: prompt }],
    **client_params
  )

  return judge_result unless judge_result[:success]

  content = extract_content(judge_result)
  return empty_response_result unless content

  Response.call(json: content)
rescue StandardError => e
  SkillBench::ErrorLogger.log_error(e, 'Judge Evaluation Error')
  { success: false, response: { error: { message: e.message } } }
end