Class: Riffer::Evals::Evaluators::AnswerRelevancy

Inherits:
Riffer::Evals::Evaluator show all
Defined in:
lib/riffer/evals/evaluators/answer_relevancy.rb

Overview

Evaluates how well a response addresses the input question.

Uses LLM-as-judge to assess whether the response is relevant, on-topic, and directly addresses what was asked.

evaluator = Riffer::Evals::Evaluators::AnswerRelevancy.new
result = evaluator.evaluate(
  input: "What is the capital of France?",
  output: "The capital of France is Paris."
)
result.score  # => 0.95

Constant Summary collapse

SYSTEM_PROMPT =

: String

<<~PROMPT #: String
    You are an evaluation assistant that assesses answer relevancy.

    Your task is to evaluate how well a response addresses the given input/question.

    Consider the following criteria:
    1. Does the response directly address what was asked?
    2. Is the response on-topic and relevant?
    3. Does the response provide the type of information requested?
    4. Does the response avoid going off on tangents?

    Use the evaluation tool to submit your score and reasoning. The score should be
    a float between 0.0 and 1.0 where:
- 1.0 = Perfectly relevant, directly addresses the question
- 0.7-0.9 = Mostly relevant with minor tangents
- 0.4-0.6 = Partially relevant, some off-topic content
- 0.1-0.3 = Mostly irrelevant
- 0.0 = Completely irrelevan

Constants included from Helpers::ClassNameConverter

Helpers::ClassNameConverter::DEFAULT_SEPARATOR

Instance Method Summary collapse

Methods inherited from Riffer::Evals::Evaluator

description, higher_is_better, identifier, judge_model

Methods included from Helpers::ClassNameConverter

#class_name_to_path

Instance Method Details

#evaluate(input:, output:, context: nil) ⇒ Object

: (input: String, output: String, ?context: Hash[Symbol, untyped]?) -> Riffer::Evals::Result



42
43
44
45
46
# File 'lib/riffer/evals/evaluators/answer_relevancy.rb', line 42

def evaluate(input:, output:, context: nil)
  user_prompt = build_user_prompt(input: input, output: output)
  evaluation = judge.evaluate(system_prompt: SYSTEM_PROMPT, user_prompt: user_prompt)
  result(score: evaluation[:score], reason: evaluation[:reason])
end