Class: Riffer::Evals::Judge
- Inherits:
-
Object
- Object
- Riffer::Evals::Judge
- Defined in:
- lib/riffer/evals/judge.rb
Overview
Executes LLM-as-judge evaluations using the provider infrastructure.
The Judge class handles calling an LLM to evaluate agent outputs and parsing the structured response. It uses tool calling internally to get guaranteed structured output from the judge model.
judge = Riffer::Evals::Judge.new(model: "anthropic/claude-opus-4-5-20251101")
result = judge.evaluate(
instructions: "Assess answer relevancy...",
input: "What is Ruby?",
output: "Ruby is a programming language."
)
result[:score] # => 0.85
result[:reason] # => "The response is relevant..."
Defined Under Namespace
Classes: EvaluationTool
Instance Attribute Summary collapse
-
#model ⇒ Object
readonly
The model string (provider/model format).
Instance Method Summary collapse
-
#evaluate(instructions:, input:, output:, ground_truth: nil) ⇒ Object
Evaluates using the configured LLM.
-
#initialize(model:, provider_options: {}) ⇒ Judge
constructor
Initializes a new judge.
Constructor Details
#initialize(model:, provider_options: {}) ⇒ Judge
Initializes a new judge.
– : (model: String, ?provider_options: Hash[Symbol, untyped]) -> void
51 52 53 54 55 56 57 58 59 |
# File 'lib/riffer/evals/judge.rb', line 51 def initialize(model:, provider_options: {}) provider_name, model_name = model.split("/", 2) unless [provider_name, model_name].all? { |part| part.is_a?(String) && !part.strip.empty? } raise Riffer::ArgumentError, "Invalid model string: #{model}" end @model = model @provider_options = end |
Instance Attribute Details
#model ⇒ Object (readonly)
The model string (provider/model format).
45 46 47 |
# File 'lib/riffer/evals/judge.rb', line 45 def model @model end |
Instance Method Details
#evaluate(instructions:, input:, output:, ground_truth: nil) ⇒ Object
Evaluates using the configured LLM.
Composes system and user messages from the semantic fields:
- instructions
-
evaluation criteria and scoring rubric.
- input
-
the original input/question.
- output
-
the agent’s response to evaluate.
- ground_truth
-
optional reference answer for comparison.
– : (instructions: String, input: String, output: String, ?ground_truth: String?) -> Hash[Symbol, untyped]
71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/riffer/evals/judge.rb', line 71 def evaluate(instructions:, input:, output:, ground_truth: nil) = (instructions) = (input: input, output: output, ground_truth: ground_truth) response = provider_instance.generate_text( system: , prompt: , model: model_name, tools: [EvaluationTool] ) parse_tool_response(response) end |