Module: Legion::LLM::ConfidenceScorer

Extended by:
Legion::Logging::Helper
Defined in:
lib/legion/llm/confidence_scorer.rb

Overview

Computes a ConfidenceScore for an LLM response using available signals.

Strategy selection (in priority order):

1. logprobs  — native model confidence from token log-probabilities (when available)
2. caller    — caller-provided score passed via options[:confidence_score]
3. heuristic — derived from response content characteristics

Band boundaries are read from Legion::Settings[:confidence] when Legion::Settings is available, otherwise the DEFAULT_BANDS constants are used. Per-call overrides can be passed as options.

Constant Summary collapse

DEFAULT_BANDS =

Default band boundaries. Keys are the lower boundary of that band name:

score <  :low       -> :very_low
score <  :medium    -> :low
score <  :high      -> :medium
score <  :very_high -> :high
score >= :very_high -> :very_high
{
  low:       0.3,
  medium:    0.5,
  high:      0.7,
  very_high: 0.9
}.freeze
HEURISTIC_WEIGHTS =

Penalty weights used in heuristic scoring.

{
  refusal:            -0.8,
  empty:              -1.0,
  truncated:          -0.4,
  repetition:         -0.5,
  json_parse_failure: -0.6,
  too_short:          -0.3
}.freeze
STRUCTURED_OUTPUT_BONUS =

Bonus applied when structured output parse succeeds.

0.1
HEDGING_PATTERNS =

Hedging language patterns that reduce confidence.

[
  /\b(?:I think|I believe|I'm not sure|I'm uncertain|it seems|it appears|maybe|perhaps|possibly|probably|I guess|I assume)\b/i,
  /\bnot (?:certain|sure|definite|confirmed)\b/i,
  /\bunclear\b/i,
  /\bcould be\b/i
].freeze

Class Method Summary collapse

Class Method Details

.score(raw_response, **options) ⇒ Object

Compute a ConfidenceScore for the given raw_response.

raw_response - the RubyLLM response object (must respond to #content) options - Hash:

:confidence_score  - Float  caller-provided score (bypasses heuristics)
:confidence_bands  - Hash   per-call band overrides
:json_expected     - Boolean whether JSON output was expected
:quality_result    - QualityResult from QualityChecker (optional, avoids re-running checks)

Returns a ConfidenceScore.



65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# File 'lib/legion/llm/confidence_scorer.rb', line 65

def score(raw_response, **options)
  bands = resolve_bands(options[:confidence_bands])

  if (caller_score = options[:confidence_score])
    return ConfidenceScore.build(
      score:   caller_score.to_f,
      bands:   bands,
      source:  :caller_provided,
      signals: { caller_provided: caller_score.to_f }
    )
  end

  if (lp = extract_logprobs(raw_response))
    return ConfidenceScore.build(
      score:   lp,
      bands:   bands,
      source:  :logprobs,
      signals: { avg_logprob: lp }
    )
  end

  heuristic_score(raw_response, bands: bands, options: options)
end