Class: Riffer::Evals::Evaluator
- Inherits:
-
Object
- Object
- Riffer::Evals::Evaluator
- Defined in:
- lib/riffer/evals/evaluator.rb
Overview
Base class for all evaluators in the Riffer framework.
Provides a DSL for defining evaluator metadata and the evaluate method. Simple evaluators only need to set instructions — the base class handles calling the judge automatically.
See examples/evaluators/ for reference implementations.
class MyEvaluator < Riffer::Evals::Evaluator
instructions "Assess medical accuracy of the response..."
higher_is_better true
judge_model "anthropic/claude-opus-4-5-20251101"
end
Class Method Summary collapse
-
.higher_is_better(value = nil) ⇒ Object
Gets or sets whether higher scores are better.
-
.instructions(value = nil) ⇒ Object
Gets or sets the evaluation instructions (criteria and scoring rubric).
-
.judge_model(value = nil) ⇒ Object
Gets or sets the judge model for LLM-as-judge evaluations.
Instance Method Summary collapse
-
#evaluate(input:, output:, ground_truth: nil, messages: []) ⇒ Object
Evaluates an input/output pair.
Class Method Details
.higher_is_better(value = nil) ⇒ Object
Gets or sets whether higher scores are better.
– : (?bool?) -> bool
33 34 35 36 |
# File 'lib/riffer/evals/evaluator.rb', line 33 def higher_is_better(value = nil) return @higher_is_better.nil? || @higher_is_better if value.nil? @higher_is_better = value end |
.instructions(value = nil) ⇒ Object
Gets or sets the evaluation instructions (criteria and scoring rubric).
– : (?String?) -> String?
24 25 26 27 |
# File 'lib/riffer/evals/evaluator.rb', line 24 def instructions(value = nil) return @instructions if value.nil? @instructions = value.to_s end |
.judge_model(value = nil) ⇒ Object
Gets or sets the judge model for LLM-as-judge evaluations.
– : (?String?) -> String?
42 43 44 45 |
# File 'lib/riffer/evals/evaluator.rb', line 42 def judge_model(value = nil) return @judge_model if value.nil? @judge_model = value.to_s end |
Instance Method Details
#evaluate(input:, output:, ground_truth: nil, messages: []) ⇒ Object
Evaluates an input/output pair.
The default implementation calls the judge with the class-level instructions. Override this method for custom evaluation logic (e.g. rule-based evaluators).
- input
-
the input to evaluate; String or Array of message hashes/Message objects.
- output
-
the agent’s response to evaluate.
- ground_truth
-
optional reference answer for comparison.
- messages
-
the full message history from the agent conversation.
Raises NotImplementedError if neither instructions is set nor evaluate is overridden.
– : (input: String | Array[Hash[Symbol, untyped] | Riffer::Messages::Base], output: String, ?ground_truth: String?, ?messages: Array) -> Riffer::Evals::Result
62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/riffer/evals/evaluator.rb', line 62 def evaluate(input:, output:, ground_truth: nil, messages: []) instr = self.class.instructions raise NotImplementedError, "#{self.class} must set instructions or implement #evaluate" unless instr evaluation = judge.evaluate( instructions: instr, input: format_input(input), output: output, ground_truth: ground_truth ) result(score: evaluation[:score], reason: evaluation[:reason]) end |