Class: Riffer::Evals::Judge

Inherits:
Object
  • Object
show all
Defined in:
lib/riffer/evals/judge.rb

Overview

Executes LLM-as-judge evaluations using the provider infrastructure.

The Judge class handles calling an LLM to evaluate agent outputs and parsing the structured response. It uses tool calling internally to get guaranteed structured output from the judge model.

judge = Riffer::Evals::Judge.new(model: "anthropic/claude-opus-4-5-20251101")
result = judge.evaluate(
  system_prompt: "You are an evaluation assistant...",
  user_prompt: "Evaluate this response..."
)
result[:score]  # => 0.85
result[:reason] # => "The response is relevant..."

Defined Under Namespace

Classes: EvaluationTool

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(model:, provider_options: {}) ⇒ Judge

Initializes a new judge.

: (model: String, ?provider_options: Hash[Symbol, untyped]) -> void



43
44
45
46
47
48
49
50
51
# File 'lib/riffer/evals/judge.rb', line 43

def initialize(model:, provider_options: {})
  provider_name, model_name = model.split("/", 2)
  unless [provider_name, model_name].all? { |part| part.is_a?(String) && !part.strip.empty? }
    raise Riffer::ArgumentError, "Invalid model string: #{model}"
  end

  @model = model
  @provider_options = provider_options
end

Instance Attribute Details

#modelObject (readonly)

The model string (provider/model format).



38
39
40
# File 'lib/riffer/evals/judge.rb', line 38

def model
  @model
end

Instance Method Details

#evaluate(messages: nil, system_prompt: nil, user_prompt: nil) ⇒ Object

Evaluates using the configured LLM.

Raises Riffer::ArgumentError if both messages and system_prompt/user_prompt are provided, or if user_prompt is missing when messages is not provided.

: (?messages: Array[Hash[Symbol, untyped]]?, ?system_prompt: String?, ?user_prompt: String?) -> Hash[Symbol, untyped]



59
60
61
62
63
64
65
66
67
68
69
# File 'lib/riffer/evals/judge.rb', line 59

def evaluate(messages: nil, system_prompt: nil, user_prompt: nil)
  response = if messages
    raise Riffer::ArgumentError, "cannot provide both messages and system_prompt/user_prompt" if system_prompt || user_prompt
    provider_instance.generate_text(messages: messages, model: model_name, tools: [EvaluationTool])
  else
    raise Riffer::ArgumentError, "user_prompt is required when messages is not provided" unless user_prompt
    provider_instance.generate_text(system: system_prompt, prompt: user_prompt, model: model_name, tools: [EvaluationTool])
  end

  parse_tool_response(response)
end