Class: LlmConductor::Eval::Spec

Inherits:

Object

Object
LlmConductor::Eval::Spec

show all

Defined in:: lib/llm_conductor/eval/spec.rb

Overview

The public extension seam. Subclass (or duck-type) this to describe how to evaluate one LLM-powered feature: how to turn a caller-supplied input into a prompt payload, how to parse the output, and what the judge should grade.

The engine itself is generic and feature-agnostic; everything feature-specific lives here. Unlike the Rails prototype’s Feature::Base, there is no select_cases — selecting which inputs to evaluate is the caller’s job, done before calling LlmConductor::Eval.run and passed via inputs:. The engine never queries a database.

Instance Method Summary collapse

#build_data(_input) ⇒ Object

Build the prompt payload for one input.
#extra_columns(_parsed) ⇒ Object

Extra per-row CSV columns beyond the base set.
#input_id(_input) ⇒ Object

Stable id for an input (was record.id).
#input_label(input) ⇒ Object

Human label for an input (was record.name).
#judge_dimensions ⇒ Object
{ key:, description: }

— dimensions the judge scores 0-100 each.
#judge_rubric_excerpt ⇒ Object

Text inlined into the judge prompt describing the rubric the candidate was asked to follow.
#output_summary(_parsed) ⇒ Object

{ score: Numeric|nil, bucket: String|nil } — powers CSV columns and the bucket-disagreement detection.
#parse(raw) ⇒ Object

Parse the LLM’s raw text into a Hash, or nil on failure.
#prompt_type ⇒ Object

Symbol passed to LlmConductor.generate as type: (must match a registered prompt).
#vendor_params(vendor:, input_id:) ⇒ Object

Vendor-specific generation params (e.g. a deterministic Ollama seed).

Instance Method Details

#build_data(_input) ⇒ `Object`

Build the prompt payload for one input. When #prompt_type is set this is passed as data:; otherwise it must be a full prompt String passed as prompt: (was build_data(record)).

Raises:

(NotImplementedError)



37
38
39

# File 'lib/llm_conductor/eval/spec.rb', line 37

def build_data(_input)
  raise NotImplementedError
end

#extra_columns(_parsed) ⇒ `Object`

Extra per-row CSV columns beyond the base set. Keys become headers.



73
74
75

# File 'lib/llm_conductor/eval/spec.rb', line 73

def extra_columns(_parsed)
  {}
end

#input_id(_input) ⇒ `Object`

Stable id for an input (was record.id). Used for output grouping/paths.

Raises:

(NotImplementedError)



25
26
27

# File 'lib/llm_conductor/eval/spec.rb', line 25

def input_id(_input)
  raise NotImplementedError
end

#input_label(input) ⇒ `Object`

Human label for an input (was record.name). Defaults to the id.



30
31
32

# File 'lib/llm_conductor/eval/spec.rb', line 30

def input_label(input)
  input_id(input).to_s
end

#judge_dimensions ⇒ `Object`

{ key:, description: }: — dimensions the judge scores 0-100 each.

Raises:

(NotImplementedError)



68
69
70

# File 'lib/llm_conductor/eval/spec.rb', line 68

def judge_dimensions
  raise NotImplementedError
end

#judge_rubric_excerpt ⇒ `Object`

Text inlined into the judge prompt describing the rubric the candidate was asked to follow.

Raises:

(NotImplementedError)



63
64
65

# File 'lib/llm_conductor/eval/spec.rb', line 63

def judge_rubric_excerpt
  raise NotImplementedError
end

#output_summary(_parsed) ⇒ `Object`

{ score: Numeric|nil, bucket: String|nil } — powers CSV columns and the bucket-disagreement detection. bucket may be any discrete label.

Raises:

(NotImplementedError)



57
58
59

# File 'lib/llm_conductor/eval/spec.rb', line 57

def output_summary(_parsed)
  raise NotImplementedError
end

#parse(raw) ⇒ `Object`

Parse the LLM’s raw text into a Hash, or nil on failure. Defaults to the gem’s conservative JsonParser; override for tuned/feature-specific parsing.



43
44
45

# File 'lib/llm_conductor/eval/spec.rb', line 43

def parse(raw)
  JsonParser.parse(raw)
end

#prompt_type ⇒ `Object`

Symbol passed to LlmConductor.generate as type: (must match a registered prompt). Return nil if instead you build a full prompt string in #build_data, in which case the engine passes it as prompt:.

Raises:

(NotImplementedError)



20
21
22

# File 'lib/llm_conductor/eval/spec.rb', line 20

def prompt_type
  raise NotImplementedError
end

#vendor_params(vendor:, input_id:) ⇒ `Object`

Vendor-specific generation params (e.g. a deterministic Ollama seed). Return {} for vendors that don’t expose one. rubocop:disable Lint/UnusedMethodArgument



50
51
52

# File 'lib/llm_conductor/eval/spec.rb', line 50

def vendor_params(vendor:, input_id:)
  {}
end

Class: LlmConductor::Eval::Spec

Overview

Instance Method Summary collapse

Instance Method Details

#build_data(_input) ⇒ Object

#extra_columns(_parsed) ⇒ Object

#input_id(_input) ⇒ Object

#input_label(input) ⇒ Object

#judge_dimensions ⇒ Object

#judge_rubric_excerpt ⇒ Object

#output_summary(_parsed) ⇒ Object

#parse(raw) ⇒ Object

#prompt_type ⇒ Object

#vendor_params(vendor:, input_id:) ⇒ Object

#build_data(_input) ⇒ `Object`

#extra_columns(_parsed) ⇒ `Object`

#input_id(_input) ⇒ `Object`

#input_label(input) ⇒ `Object`

#judge_dimensions ⇒ `Object`

#judge_rubric_excerpt ⇒ `Object`

#output_summary(_parsed) ⇒ `Object`

#parse(raw) ⇒ `Object`

#prompt_type ⇒ `Object`

#vendor_params(vendor:, input_id:) ⇒ `Object`