Class: Riffer::Evals::Evaluators::AnswerRelevancy
- Inherits:
-
Riffer::Evals::Evaluator
- Object
- Riffer::Evals::Evaluator
- Riffer::Evals::Evaluators::AnswerRelevancy
- Defined in:
- lib/riffer/evals/evaluators/answer_relevancy.rb
Overview
Evaluates how well a response addresses the input question.
Uses LLM-as-judge to assess whether the response is relevant, on-topic, and directly addresses what was asked.
evaluator = Riffer::Evals::Evaluators::AnswerRelevancy.new
result = evaluator.evaluate(
input: "What is the capital of France?",
output: "The capital of France is Paris."
)
result.score # => 0.95
Constant Summary collapse
- SYSTEM_PROMPT =
: String
<<~PROMPT #: String You are an evaluation assistant that assesses answer relevancy. Your task is to evaluate how well a response addresses the given input/question. Consider the following criteria: 1. Does the response directly address what was asked? 2. Is the response on-topic and relevant? 3. Does the response provide the type of information requested? 4. Does the response avoid going off on tangents? Use the evaluation tool to submit your score and reasoning. The score should be a float between 0.0 and 1.0 where: - 1.0 = Perfectly relevant, directly addresses the question - 0.7-0.9 = Mostly relevant with minor tangents - 0.4-0.6 = Partially relevant, some off-topic content - 0.1-0.3 = Mostly irrelevant - 0.0 = Completely irrelevan
Instance Method Summary collapse
-
#evaluate(input:, output:, context: nil) ⇒ Object
: (input: String, output: String, ?context: Hash[Symbol, untyped]?) -> Riffer::Evals::Result.
Methods inherited from Riffer::Evals::Evaluator
description, higher_is_better, judge_model
Instance Method Details
#evaluate(input:, output:, context: nil) ⇒ Object
: (input: String, output: String, ?context: Hash[Symbol, untyped]?) -> Riffer::Evals::Result
41 42 43 44 45 |
# File 'lib/riffer/evals/evaluators/answer_relevancy.rb', line 41 def evaluate(input:, output:, context: nil) user_prompt = build_user_prompt(input: input, output: output) evaluation = judge.evaluate(system_prompt: SYSTEM_PROMPT, user_prompt: user_prompt) result(score: evaluation[:score], reason: evaluation[:reason]) end |