Class: Riffer::Evals::Evaluator

Inherits:
Object
  • Object
show all
Defined in:
lib/riffer/evals/evaluator.rb

Overview

Base class for all evaluators in the Riffer framework.

Provides a DSL for defining evaluator metadata and the evaluate method. Simple evaluators only need to set instructions — the base class handles calling the judge automatically.

See examples/evaluators/ for reference implementations.

class MyEvaluator < Riffer::Evals::Evaluator
  instructions "Assess medical accuracy of the response..."
  higher_is_better true
  judge_model "anthropic/claude-opus-4-5-20251101"
end

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.higher_is_better(value = nil) ⇒ Object

Gets or sets whether higher scores are better.

– : (?bool?) -> bool



33
34
35
36
# File 'lib/riffer/evals/evaluator.rb', line 33

def higher_is_better(value = nil)
  return @higher_is_better.nil? || @higher_is_better if value.nil?
  @higher_is_better = value
end

.instructions(value = nil) ⇒ Object

Gets or sets the evaluation instructions (criteria and scoring rubric).

– : (?String?) -> String?



24
25
26
27
# File 'lib/riffer/evals/evaluator.rb', line 24

def instructions(value = nil)
  return @instructions if value.nil?
  @instructions = value.to_s
end

.judge_model(value = nil) ⇒ Object

Gets or sets the judge model for LLM-as-judge evaluations.

– : (?String?) -> String?



42
43
44
45
# File 'lib/riffer/evals/evaluator.rb', line 42

def judge_model(value = nil)
  return @judge_model if value.nil?
  @judge_model = value.to_s
end

Instance Method Details

#evaluate(input:, output:, ground_truth: nil, messages: []) ⇒ Object

Evaluates an input/output pair.

The default implementation calls the judge with the class-level instructions. Override this method for custom evaluation logic (e.g. rule-based evaluators).

input

the input to evaluate; String or Array of message hashes/Message objects.

output

the agent’s response to evaluate.

ground_truth

optional reference answer for comparison.

messages

the full message history from the agent conversation.

Raises NotImplementedError if neither instructions is set nor evaluate is overridden.

– : (input: String | Array[Hash[Symbol, untyped] | Riffer::Messages::Base], output: String, ?ground_truth: String?, ?messages: Array) -> Riffer::Evals::Result

Raises:

  • (NotImplementedError)


62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/riffer/evals/evaluator.rb', line 62

def evaluate(input:, output:, ground_truth: nil, messages: [])
  instr = self.class.instructions
  raise NotImplementedError, "#{self.class} must set instructions or implement #evaluate" unless instr

  evaluation = judge.evaluate(
    instructions: instr,
    input: format_input(input),
    output: output,
    ground_truth: ground_truth
  )

  result(score: evaluation[:score], reason: evaluation[:reason])
end