Class: Riffer::Evals::Judge
- Inherits:
-
Object
- Object
- Riffer::Evals::Judge
- Defined in:
- lib/riffer/evals/judge.rb
Overview
Executes LLM-as-judge evaluations using the provider infrastructure.
The Judge class handles calling an LLM to evaluate agent outputs and parsing the structured response. It uses tool calling internally to get guaranteed structured output from the judge model.
judge = Riffer::Evals::Judge.new(model: "anthropic/claude-opus-4-5-20251101")
result = judge.evaluate(
system_prompt: "You are an evaluation assistant...",
user_prompt: "Evaluate this response..."
)
result[:score] # => 0.85
result[:reason] # => "The response is relevant..."
Defined Under Namespace
Classes: EvaluationTool
Instance Attribute Summary collapse
-
#model ⇒ Object
readonly
The model string (provider/model format).
Instance Method Summary collapse
-
#evaluate(messages: nil, system_prompt: nil, user_prompt: nil) ⇒ Object
Evaluates using the configured LLM.
-
#initialize(model:, provider_options: {}) ⇒ Judge
constructor
Initializes a new judge.
Constructor Details
#initialize(model:, provider_options: {}) ⇒ Judge
Initializes a new judge.
: (model: String, ?provider_options: Hash[Symbol, untyped]) -> void
43 44 45 46 47 48 49 50 51 |
# File 'lib/riffer/evals/judge.rb', line 43 def initialize(model:, provider_options: {}) provider_name, model_name = model.split("/", 2) unless [provider_name, model_name].all? { |part| part.is_a?(String) && !part.strip.empty? } raise Riffer::ArgumentError, "Invalid model string: #{model}" end @model = model @provider_options = end |
Instance Attribute Details
#model ⇒ Object (readonly)
The model string (provider/model format).
38 39 40 |
# File 'lib/riffer/evals/judge.rb', line 38 def model @model end |
Instance Method Details
#evaluate(messages: nil, system_prompt: nil, user_prompt: nil) ⇒ Object
Evaluates using the configured LLM.
Raises Riffer::ArgumentError if both messages and system_prompt/user_prompt are provided, or if user_prompt is missing when messages is not provided.
: (?messages: Array[Hash[Symbol, untyped]]?, ?system_prompt: String?, ?user_prompt: String?) -> Hash[Symbol, untyped]
59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/riffer/evals/judge.rb', line 59 def evaluate(messages: nil, system_prompt: nil, user_prompt: nil) response = if raise Riffer::ArgumentError, "cannot provide both messages and system_prompt/user_prompt" if system_prompt || user_prompt provider_instance.generate_text(messages: , model: model_name, tools: [EvaluationTool]) else raise Riffer::ArgumentError, "user_prompt is required when messages is not provided" unless user_prompt provider_instance.generate_text(system: system_prompt, prompt: user_prompt, model: model_name, tools: [EvaluationTool]) end parse_tool_response(response) end |