Class: Riffer::Evals::Evaluator

Inherits:

Object

Object
Riffer::Evals::Evaluator

show all

Defined in:: lib/riffer/evals/evaluator.rb

Overview

Base class for all evaluators in the Riffer framework.

Provides a DSL for defining evaluator metadata and the evaluate method. Simple evaluators only need to set instructions — the base class handles calling the judge automatically.

See examples/evaluators/ for reference implementations.

class MyEvaluator < Riffer::Evals::Evaluator
  instructions "Assess medical accuracy of the response..."
  higher_is_better true
  judge_model "anthropic/claude-opus-4-5-20251101"
end

Class Method Summary collapse

.higher_is_better(value = nil) ⇒ Object

Gets or sets whether higher scores are better.
.instructions(value = nil) ⇒ Object

Gets or sets the evaluation instructions (criteria and scoring rubric).
.judge_model(value = nil) ⇒ Object

Gets or sets the judge model for LLM-as-judge evaluations.

Instance Method Summary collapse

#evaluate(input:, output:, ground_truth: nil, messages: []) ⇒ Object

Evaluates an input/output pair.

Class Method Details

.higher_is_better(value = nil) ⇒ `Object`

Gets or sets whether higher scores are better.

– : (?bool?) -> bool

# File 'lib/riffer/evals/evaluator.rb', line 33

def higher_is_better(value = nil)
  return @higher_is_better.nil? || @higher_is_better if value.nil?
  @higher_is_better = value
end

.instructions(value = nil) ⇒ `Object`

Gets or sets the evaluation instructions (criteria and scoring rubric).

– : (?String?) -> String?

# File 'lib/riffer/evals/evaluator.rb', line 24

def instructions(value = nil)
  return @instructions if value.nil?
  @instructions = value.to_s
end

.judge_model(value = nil) ⇒ `Object`

Gets or sets the judge model for LLM-as-judge evaluations.

– : (?String?) -> String?

# File 'lib/riffer/evals/evaluator.rb', line 42

def judge_model(value = nil)
  return @judge_model if value.nil?
  @judge_model = value.to_s
end

Instance Method Details

#evaluate(input:, output:, ground_truth: nil, messages: []) ⇒ `Object`

Evaluates an input/output pair.

The default implementation calls the judge with the class-level instructions. Override this method for custom evaluation logic (e.g. rule-based evaluators).

input: the input to evaluate; String or Array of message hashes/Message objects.
output: the agent’s response to evaluate.
ground_truth: optional reference answer for comparison.
messages: the full message history from the agent conversation.

Raises NotImplementedError if neither instructions is set nor evaluate is overridden.

– : (input: String | Array[Hash[Symbol, untyped] | Riffer::Messages::Base], output: String, ?ground_truth: String?, ?messages: Array) -> Riffer::Evals::Result

Raises:

(NotImplementedError)

# File 'lib/riffer/evals/evaluator.rb', line 62

def evaluate(input:, output:, ground_truth: nil, messages: [])
  instr = self.class.instructions
  raise NotImplementedError, "#{self.class} must set instructions or implement #evaluate" unless instr

  evaluation = judge.evaluate(
    instructions: instr,
    input: format_input(input),
    output: output,
    ground_truth: ground_truth
  )

  result(score: evaluation[:score], reason: evaluation[:reason])
end