Class: Phronomy::Eval::Scorer::LlmJudge
- Defined in:
- lib/phronomy/eval/scorer/llm_judge.rb
Overview
LLM-as-a-Judge scorer. Sends a structured prompt to an LLM and interprets its numeric reply as a quality score in [0.0, 1.0].
The prompt template accepts three named placeholders:
%s — the original input question
%
The LLM is expected to reply with a single decimal number; any extra text is stripped and the value is clamped to [0.0, 1.0]. If parsing fails the scorer returns 0.0 rather than raising.
Constant Summary collapse
- DEFAULT_PROMPT =
<<~PROMPT You are an impartial judge evaluating the quality of an AI assistant response. Rate the response on a scale from 0.0 (completely wrong or unhelpful) to 1.0 (perfect). Respond with ONLY a single decimal number between 0.0 and 1.0 — no other text. Question: %<input>s Expected answer: %<expected>s Actual response: %<actual>s Score: PROMPT
Instance Method Summary collapse
-
#initialize(model:, prompt_template: DEFAULT_PROMPT) ⇒ LlmJudge
constructor
A new instance of LlmJudge.
-
#score(actual:, expected:, input: nil) ⇒ Float
Score in [0.0, 1.0]; 0.0 on any error.
Constructor Details
#initialize(model:, prompt_template: DEFAULT_PROMPT) ⇒ LlmJudge
Returns a new instance of LlmJudge.
37 38 39 40 |
# File 'lib/phronomy/eval/scorer/llm_judge.rb', line 37 def initialize(model:, prompt_template: DEFAULT_PROMPT) @model = model @prompt_template = prompt_template end |
Instance Method Details
#score(actual:, expected:, input: nil) ⇒ Float
Returns score in [0.0, 1.0]; 0.0 on any error.
43 44 45 46 47 48 49 50 |
# File 'lib/phronomy/eval/scorer/llm_judge.rb', line 43 def score(actual:, expected:, input: nil) prompt = format(@prompt_template, input: input.to_s, expected: expected.to_s, actual: actual.to_s) response = RubyLLM.chat(model: @model).ask(prompt) response.content.to_s.strip.scan(/-?\d+\.?\d*/).first.to_f.clamp(0.0, 1.0) rescue => e warn "[LlmJudge] Scoring failed: #{e.}" 0.0 end |