Class: LlmConductor::Eval::Runner

Inherits:

Object

Object
LlmConductor::Eval::Runner

show all

Defined in:: lib/llm_conductor/eval/runner.rb

Overview

Top-level orchestrator. For each input, builds the prompt data once, runs every candidate (input, model) pair through ModelRunner, judges it, and rewrites the manifest after each pair so the run stays resumable / reportable mid-flight.

Unlike the Rails prototype it does NO data selection — the caller passes inputs: directly. See LlmConductor::Eval.run for the public entrypoint.

Class Method Summary collapse

.judge_only(run_id:, spec:, store:, judge:, logger:) ⇒ Object

Re-run the judge against stored candidate outputs (e.g. after changing the judge model).
.normalize_judge(judge) ⇒ Object
.rejudge_row(raw, judge_obj, store, run_id) ⇒ Object
.report_only(run_id:, spec:, store:) ⇒ Object

Rebuild a Report from a stored manifest without recalling models or judge.
.restore_result(raw) ⇒ Object
.restore_row(raw) ⇒ Object
.restore_verdict(raw) ⇒ Object

Instance Method Summary collapse

#initialize(spec:, inputs:, models:, judge:, store:, logger:, run_id:) ⇒ Runner constructor

A new instance of Runner.
#run ⇒ Object

Constructor Details

#initialize(spec:, inputs:, models:, judge:, store:, logger:, run_id:) ⇒ `Runner`

Returns a new instance of Runner.

# File 'lib/llm_conductor/eval/runner.rb', line 20

def initialize(spec:, inputs:, models:, judge:, store:, logger:, run_id:)
  @spec = spec
  @inputs = inputs.to_a
  @models = models
  @judge_config = self.class.normalize_judge(judge)
  @store = store
  @logger = logger
  @run_id = run_id
end

Class Method Details

.judge_only(run_id:, spec:, store:, judge:, logger:) ⇒ `Object`

Re-run the judge against stored candidate outputs (e.g. after changing the judge model). Fully self-contained: input data is read from the store.

# File 'lib/llm_conductor/eval/runner.rb', line 50

def self.judge_only(run_id:, spec:, store:, judge:, logger:)
  config = normalize_judge(judge)
  manifest = store.read_manifest(run_id) or raise ArgumentError, "No manifest for run_id=#{run_id}"
  judge_obj = Judge.new(spec:, store:, run_id:, logger:,
                        judge_model: config[:model], judge_vendor: config[:vendor])
  rows = manifest['rows'].map { |raw| rejudge_row(raw, judge_obj, store, run_id) }
  manifest['judge_model'] = config[:model]
  manifest['rejudged_at'] = Time.now.utc.iso8601
  store.write_manifest(run_id, manifest)
  ReportBuilder.new(rows:, run_id:, judge_model: config[:model], spec:).build
end

.normalize_judge(judge) ⇒ `Object`

# File 'lib/llm_conductor/eval/runner.rb', line 62

def self.normalize_judge(judge)
  judge ||= {}
  { model: judge[:model] || Judge::DEFAULT_MODEL,
    vendor: (judge[:vendor] || Judge::DEFAULT_VENDOR).to_sym }
end

.rejudge_row(raw, judge_obj, store, run_id) ⇒ `Object`

# File 'lib/llm_conductor/eval/runner.rb', line 81

def self.rejudge_row(raw, judge_obj, store, run_id)
  result = restore_result(raw['model_result'])
  input_data = store.read_input_data(run_id, result.input_id)
  verdict = judge_obj.judge(model_result: result, input_data:)
  raw['judge_verdict'] = verdict&.to_h
  { model_result: result, judge_verdict: verdict }
end

.report_only(run_id:, spec:, store:) ⇒ `Object`

Rebuild a Report from a stored manifest without recalling models or judge.

# File 'lib/llm_conductor/eval/runner.rb', line 42

def self.report_only(run_id:, spec:, store:)
  manifest = store.read_manifest(run_id) or raise ArgumentError, "No manifest for run_id=#{run_id}"
  rows = manifest['rows'].map { |raw| restore_row(raw) }
  ReportBuilder.new(rows:, run_id:, judge_model: manifest['judge_model'], spec:).build
end

.restore_result(raw) ⇒ `Object`



68
69
70

# File 'lib/llm_conductor/eval/runner.rb', line 68

def self.restore_result(raw)
  Result.new(**raw.transform_keys(&:to_sym))
end

.restore_row(raw) ⇒ `Object`

# File 'lib/llm_conductor/eval/runner.rb', line 76

def self.restore_row(raw)
  { model_result: restore_result(raw['model_result']),
    judge_verdict: restore_verdict(raw['judge_verdict']) }
end

.restore_verdict(raw) ⇒ `Object`



72
73
74

# File 'lib/llm_conductor/eval/runner.rb', line 72

def self.restore_verdict(raw)
  raw ? Verdict.new(**raw.transform_keys(&:to_sym)) : nil
end

Instance Method Details

#run ⇒ `Object`

# File 'lib/llm_conductor/eval/runner.rb', line 30

def run
  @logger.info("LLM eval run=#{@run_id} models=#{@models.map { |m| m[:model] }.join(',')} " \
               "judge=#{@judge_config[:model]}")
  warn_self_judge
  manifest = base_manifest
  rows = run_all_pairs(manifest)
  manifest[:finished_at] = Time.now.utc.iso8601
  @store.write_manifest(@run_id, manifest)
  build_report(rows)
end

Class: LlmConductor::Eval::Runner

Overview

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(spec:, inputs:, models:, judge:, store:, logger:, run_id:) ⇒ Runner

Class Method Details

.judge_only(run_id:, spec:, store:, judge:, logger:) ⇒ Object

.normalize_judge(judge) ⇒ Object

.rejudge_row(raw, judge_obj, store, run_id) ⇒ Object

.report_only(run_id:, spec:, store:) ⇒ Object

.restore_result(raw) ⇒ Object

.restore_row(raw) ⇒ Object

.restore_verdict(raw) ⇒ Object