Class: LlmConductor::Eval::Runner

Inherits:
Object
  • Object
show all
Defined in:
lib/llm_conductor/eval/runner.rb

Overview

Top-level orchestrator. For each input, builds the prompt data once, runs every candidate (input, model) pair through ModelRunner, judges it, and rewrites the manifest after each pair so the run stays resumable / reportable mid-flight.

Unlike the Rails prototype it does NO data selection — the caller passes inputs: directly. See LlmConductor::Eval.run for the public entrypoint.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(spec:, inputs:, models:, judge:, store:, logger:, run_id:) ⇒ Runner

Returns a new instance of Runner.



20
21
22
23
24
25
26
27
28
# File 'lib/llm_conductor/eval/runner.rb', line 20

def initialize(spec:, inputs:, models:, judge:, store:, logger:, run_id:)
  @spec = spec
  @inputs = inputs.to_a
  @models = models
  @judge_config = self.class.normalize_judge(judge)
  @store = store
  @logger = logger
  @run_id = run_id
end

Class Method Details

.judge_only(run_id:, spec:, store:, judge:, logger:) ⇒ Object

Re-run the judge against stored candidate outputs (e.g. after changing the judge model). Fully self-contained: input data is read from the store.



50
51
52
53
54
55
56
57
58
59
60
# File 'lib/llm_conductor/eval/runner.rb', line 50

def self.judge_only(run_id:, spec:, store:, judge:, logger:)
  config = normalize_judge(judge)
  manifest = store.read_manifest(run_id) or raise ArgumentError, "No manifest for run_id=#{run_id}"
  judge_obj = Judge.new(spec:, store:, run_id:, logger:,
                        judge_model: config[:model], judge_vendor: config[:vendor])
  rows = manifest['rows'].map { |raw| rejudge_row(raw, judge_obj, store, run_id) }
  manifest['judge_model'] = config[:model]
  manifest['rejudged_at'] = Time.now.utc.iso8601
  store.write_manifest(run_id, manifest)
  ReportBuilder.new(rows:, run_id:, judge_model: config[:model], spec:).build
end

.normalize_judge(judge) ⇒ Object



62
63
64
65
66
# File 'lib/llm_conductor/eval/runner.rb', line 62

def self.normalize_judge(judge)
  judge ||= {}
  { model: judge[:model] || Judge::DEFAULT_MODEL,
    vendor: (judge[:vendor] || Judge::DEFAULT_VENDOR).to_sym }
end

.rejudge_row(raw, judge_obj, store, run_id) ⇒ Object



81
82
83
84
85
86
87
# File 'lib/llm_conductor/eval/runner.rb', line 81

def self.rejudge_row(raw, judge_obj, store, run_id)
  result = restore_result(raw['model_result'])
  input_data = store.read_input_data(run_id, result.input_id)
  verdict = judge_obj.judge(model_result: result, input_data:)
  raw['judge_verdict'] = verdict&.to_h
  { model_result: result, judge_verdict: verdict }
end

.report_only(run_id:, spec:, store:) ⇒ Object

Rebuild a Report from a stored manifest without recalling models or judge.



42
43
44
45
46
# File 'lib/llm_conductor/eval/runner.rb', line 42

def self.report_only(run_id:, spec:, store:)
  manifest = store.read_manifest(run_id) or raise ArgumentError, "No manifest for run_id=#{run_id}"
  rows = manifest['rows'].map { |raw| restore_row(raw) }
  ReportBuilder.new(rows:, run_id:, judge_model: manifest['judge_model'], spec:).build
end

.restore_result(raw) ⇒ Object



68
69
70
# File 'lib/llm_conductor/eval/runner.rb', line 68

def self.restore_result(raw)
  Result.new(**raw.transform_keys(&:to_sym))
end

.restore_row(raw) ⇒ Object



76
77
78
79
# File 'lib/llm_conductor/eval/runner.rb', line 76

def self.restore_row(raw)
  { model_result: restore_result(raw['model_result']),
    judge_verdict: restore_verdict(raw['judge_verdict']) }
end

.restore_verdict(raw) ⇒ Object



72
73
74
# File 'lib/llm_conductor/eval/runner.rb', line 72

def self.restore_verdict(raw)
  raw ? Verdict.new(**raw.transform_keys(&:to_sym)) : nil
end

Instance Method Details

#runObject



30
31
32
33
34
35
36
37
38
39
# File 'lib/llm_conductor/eval/runner.rb', line 30

def run
  @logger.info("LLM eval run=#{@run_id} models=#{@models.map { |m| m[:model] }.join(',')} " \
               "judge=#{@judge_config[:model]}")
  warn_self_judge
  manifest = base_manifest
  rows = run_all_pairs(manifest)
  manifest[:finished_at] = Time.now.utc.iso8601
  @store.write_manifest(@run_id, manifest)
  build_report(rows)
end