Class: Riffer::Evals::Judge

Inherits:

Object

Object
Riffer::Evals::Judge

show all

Defined in:: lib/riffer/evals/judge.rb

Overview

Executes LLM-as-judge evaluations, using tool calling internally to get structured output from the judge model.

Defined Under Namespace

Classes: EvaluationTool

Instance Attribute Summary collapse

#model ⇒ Object readonly

The model string (provider/model format).

Instance Method Summary collapse

#evaluate(instructions:, input:, output:, ground_truth: nil) ⇒ Object

Evaluates an input/output pair using the configured LLM.
#initialize(model:, provider_options: {}) ⇒ Judge constructor

Raises Riffer::ArgumentError unless model is “provider/model” format.

Constructor Details

#initialize(model:, provider_options: {}) ⇒ `Judge`

Raises Riffer::ArgumentError unless model is “provider/model” format. – : (model: String, ?provider_options: Hash[Symbol, untyped]) -> void

# File 'lib/riffer/evals/judge.rb', line 37

def initialize(model:, provider_options: {})
  provider_name, model_name = model.split("/", 2)
  unless [provider_name, model_name].all? { |part| part.is_a?(String) && !part.strip.empty? }
    raise Riffer::ArgumentError, "Invalid model string: #{model}"
  end

  @model = model
  @provider_options = provider_options
end

Instance Attribute Details

#model ⇒ `Object` (readonly)

The model string (provider/model format).



32
33
34

# File 'lib/riffer/evals/judge.rb', line 32

def model
  @model
end

Instance Method Details

#evaluate(instructions:, input:, output:, ground_truth: nil) ⇒ `Object`

Evaluates an input/output pair using the configured LLM. – : (instructions: String, input: String, output: String, ?ground_truth: String?) -> Hash[Symbol, untyped]

# File 'lib/riffer/evals/judge.rb', line 50

def evaluate(instructions:, input:, output:, ground_truth: nil)
  system_message = build_system_message(instructions)
  user_message = build_user_message(input: input, output: output, ground_truth: ground_truth)

  response = provider_instance.generate_text(
    system: system_message,
    prompt: user_message,
    model: model_name,
    tools: [EvaluationTool]
  )

  parse_tool_response(response)
end