Class: LlmConductor::Eval::Spec

Inherits:
Object
  • Object
show all
Defined in:
lib/llm_conductor/eval/spec.rb

Overview

The public extension seam. Subclass (or duck-type) this to describe how to evaluate one LLM-powered feature: how to turn a caller-supplied input into a prompt payload, how to parse the output, and what the judge should grade.

The engine itself is generic and feature-agnostic; everything feature-specific lives here. Unlike the Rails prototype’s Feature::Base, there is no select_cases — selecting which inputs to evaluate is the caller’s job, done before calling LlmConductor::Eval.run and passed via inputs:. The engine never queries a database.

Instance Method Summary collapse

Instance Method Details

#build_data(_input) ⇒ Object

Build the prompt payload for one input. When #prompt_type is set this is passed as data:; otherwise it must be a full prompt String passed as prompt: (was build_data(record)).

Raises:

  • (NotImplementedError)


37
38
39
# File 'lib/llm_conductor/eval/spec.rb', line 37

def build_data(_input)
  raise NotImplementedError
end

#extra_columns(_parsed) ⇒ Object

Extra per-row CSV columns beyond the base set. Keys become headers.



73
74
75
# File 'lib/llm_conductor/eval/spec.rb', line 73

def extra_columns(_parsed)
  {}
end

#input_id(_input) ⇒ Object

Stable id for an input (was record.id). Used for output grouping/paths.

Raises:

  • (NotImplementedError)


25
26
27
# File 'lib/llm_conductor/eval/spec.rb', line 25

def input_id(_input)
  raise NotImplementedError
end

#input_label(input) ⇒ Object

Human label for an input (was record.name). Defaults to the id.



30
31
32
# File 'lib/llm_conductor/eval/spec.rb', line 30

def input_label(input)
  input_id(input).to_s
end

#judge_dimensionsObject

{ key:, description: }

— dimensions the judge scores 0-100 each.

Raises:

  • (NotImplementedError)


68
69
70
# File 'lib/llm_conductor/eval/spec.rb', line 68

def judge_dimensions
  raise NotImplementedError
end

#judge_rubric_excerptObject

Text inlined into the judge prompt describing the rubric the candidate was asked to follow.

Raises:

  • (NotImplementedError)


63
64
65
# File 'lib/llm_conductor/eval/spec.rb', line 63

def judge_rubric_excerpt
  raise NotImplementedError
end

#output_summary(_parsed) ⇒ Object

{ score: Numeric|nil, bucket: String|nil } — powers CSV columns and the bucket-disagreement detection. bucket may be any discrete label.

Raises:

  • (NotImplementedError)


57
58
59
# File 'lib/llm_conductor/eval/spec.rb', line 57

def output_summary(_parsed)
  raise NotImplementedError
end

#parse(raw) ⇒ Object

Parse the LLM’s raw text into a Hash, or nil on failure. Defaults to the gem’s conservative JsonParser; override for tuned/feature-specific parsing.



43
44
45
# File 'lib/llm_conductor/eval/spec.rb', line 43

def parse(raw)
  JsonParser.parse(raw)
end

#prompt_typeObject

Symbol passed to LlmConductor.generate as type: (must match a registered prompt). Return nil if instead you build a full prompt string in #build_data, in which case the engine passes it as prompt:.

Raises:

  • (NotImplementedError)


20
21
22
# File 'lib/llm_conductor/eval/spec.rb', line 20

def prompt_type
  raise NotImplementedError
end

#vendor_params(vendor:, input_id:) ⇒ Object

Vendor-specific generation params (e.g. a deterministic Ollama seed). Return {} for vendors that don’t expose one. rubocop:disable Lint/UnusedMethodArgument



50
51
52
# File 'lib/llm_conductor/eval/spec.rb', line 50

def vendor_params(vendor:, input_id:)
  {}
end