Class: Riffer::Evals::Judge

Inherits:
Object
  • Object
show all
Defined in:
lib/riffer/evals/judge.rb

Overview

Executes LLM-as-judge evaluations using the provider infrastructure.

The Judge class handles calling an LLM to evaluate agent outputs and parsing the structured response. It uses tool calling internally to get guaranteed structured output from the judge model.

judge = Riffer::Evals::Judge.new(model: "anthropic/claude-opus-4-5-20251101")
result = judge.evaluate(
  instructions: "Assess answer relevancy...",
  input: "What is Ruby?",
  output: "Ruby is a programming language."
)
result[:score]  # => 0.85
result[:reason] # => "The response is relevant..."

Defined Under Namespace

Classes: EvaluationTool

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model:, provider_options: {}) ⇒ Judge

Initializes a new judge.

– : (model: String, ?provider_options: Hash[Symbol, untyped]) -> void



46
47
48
49
50
51
52
53
54
# File 'lib/riffer/evals/judge.rb', line 46

def initialize(model:, provider_options: {})
  provider_name, model_name = model.split("/", 2)
  unless [provider_name, model_name].all? { |part| part.is_a?(String) && !part.strip.empty? }
    raise Riffer::ArgumentError, "Invalid model string: #{model}"
  end

  @model = model
  @provider_options = provider_options
end

Instance Attribute Details

#modelObject (readonly)

The model string (provider/model format).



40
41
42
# File 'lib/riffer/evals/judge.rb', line 40

def model
  @model
end

Instance Method Details

#evaluate(instructions:, input:, output:, ground_truth: nil) ⇒ Object

Evaluates using the configured LLM.

Composes system and user messages from the semantic fields:

instructions

evaluation criteria and scoring rubric.

input

the original input/question.

output

the agent’s response to evaluate.

ground_truth

optional reference answer for comparison.

– : (instructions: String, input: String, output: String, ?ground_truth: String?) -> Hash[Symbol, untyped]



66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/riffer/evals/judge.rb', line 66

def evaluate(instructions:, input:, output:, ground_truth: nil)
  system_message = build_system_message(instructions)
  user_message = build_user_message(input: input, output: output, ground_truth: ground_truth)

  response = provider_instance.generate_text(
    system: system_message,
    prompt: user_message,
    model: model_name,
    tools: [EvaluationTool]
  )

  parse_tool_response(response)
end