Class: Riffer::Evals::Evaluators::AnswerRelevancy

Inherits:

Riffer::Evals::Evaluator

Object
Riffer::Evals::Evaluator
Riffer::Evals::Evaluators::AnswerRelevancy

show all

Defined in:: lib/riffer/evals/evaluators/answer_relevancy.rb

Overview

Evaluates how well a response addresses the input question.

Uses LLM-as-judge to assess whether the response is relevant, on-topic, and directly addresses what was asked.

evaluator = Riffer::Evals::Evaluators::AnswerRelevancy.new
result = evaluator.evaluate(
  input: "What is the capital of France?",
  output: "The capital of France is Paris."
)
result.score  # => 0.95

Constant Summary collapse

SYSTEM_PROMPT = : String

<<~PROMPT #: String
    You are an evaluation assistant that assesses answer relevancy.

    Your task is to evaluate how well a response addresses the given input/question.

    Consider the following criteria:
    1. Does the response directly address what was asked?
    2. Is the response on-topic and relevant?
    3. Does the response provide the type of information requested?
    4. Does the response avoid going off on tangents?

    Use the evaluation tool to submit your score and reasoning. The score should be
    a float between 0.0 and 1.0 where:
- 1.0 = Perfectly relevant, directly addresses the question
- 0.7-0.9 = Mostly relevant with minor tangents
- 0.4-0.6 = Partially relevant, some off-topic content
- 0.1-0.3 = Mostly irrelevant
- 0.0 = Completely irrelevan

Instance Method Summary collapse

#evaluate(input:, output:, context: nil) ⇒ Object

: (input: String, output: String, ?context: Hash[Symbol, untyped]?) -> Riffer::Evals::Result.

Methods inherited from Riffer::Evals::Evaluator

description, higher_is_better, judge_model

Instance Method Details

#evaluate(input:, output:, context: nil) ⇒ `Object`

: (input: String, output: String, ?context: Hash[Symbol, untyped]?) -> Riffer::Evals::Result

# File 'lib/riffer/evals/evaluators/answer_relevancy.rb', line 41

def evaluate(input:, output:, context: nil)
  user_prompt = build_user_prompt(input: input, output: output)
  evaluation = judge.evaluate(system_prompt: SYSTEM_PROMPT, user_prompt: user_prompt)
  result(score: evaluation[:score], reason: evaluation[:reason])
end