Class: LlmConductor::Eval::Runner
- Inherits:
-
Object
- Object
- LlmConductor::Eval::Runner
- Defined in:
- lib/llm_conductor/eval/runner.rb
Overview
Top-level orchestrator. For each input, builds the prompt data once, runs every candidate (input, model) pair through ModelRunner, judges it, and rewrites the manifest after each pair so the run stays resumable / reportable mid-flight.
Unlike the Rails prototype it does NO data selection — the caller passes inputs: directly. See LlmConductor::Eval.run for the public entrypoint.
Class Method Summary collapse
-
.judge_only(run_id:, spec:, store:, judge:, logger:) ⇒ Object
Re-run the judge against stored candidate outputs (e.g. after changing the judge model).
- .normalize_judge(judge) ⇒ Object
- .rejudge_row(raw, judge_obj, store, run_id) ⇒ Object
-
.report_only(run_id:, spec:, store:) ⇒ Object
Rebuild a Report from a stored manifest without recalling models or judge.
- .restore_result(raw) ⇒ Object
- .restore_row(raw) ⇒ Object
- .restore_verdict(raw) ⇒ Object
Instance Method Summary collapse
-
#initialize(spec:, inputs:, models:, judge:, store:, logger:, run_id:) ⇒ Runner
constructor
A new instance of Runner.
- #run ⇒ Object
Constructor Details
#initialize(spec:, inputs:, models:, judge:, store:, logger:, run_id:) ⇒ Runner
Returns a new instance of Runner.
20 21 22 23 24 25 26 27 28 |
# File 'lib/llm_conductor/eval/runner.rb', line 20 def initialize(spec:, inputs:, models:, judge:, store:, logger:, run_id:) @spec = spec @inputs = inputs.to_a @models = models @judge_config = self.class.normalize_judge(judge) @store = store @logger = logger @run_id = run_id end |
Class Method Details
.judge_only(run_id:, spec:, store:, judge:, logger:) ⇒ Object
Re-run the judge against stored candidate outputs (e.g. after changing the judge model). Fully self-contained: input data is read from the store.
50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/llm_conductor/eval/runner.rb', line 50 def self.judge_only(run_id:, spec:, store:, judge:, logger:) config = normalize_judge(judge) manifest = store.read_manifest(run_id) or raise ArgumentError, "No manifest for run_id=#{run_id}" judge_obj = Judge.new(spec:, store:, run_id:, logger:, judge_model: config[:model], judge_vendor: config[:vendor]) rows = manifest['rows'].map { |raw| rejudge_row(raw, judge_obj, store, run_id) } manifest['judge_model'] = config[:model] manifest['rejudged_at'] = Time.now.utc.iso8601 store.write_manifest(run_id, manifest) ReportBuilder.new(rows:, run_id:, judge_model: config[:model], spec:).build end |
.normalize_judge(judge) ⇒ Object
62 63 64 65 66 |
# File 'lib/llm_conductor/eval/runner.rb', line 62 def self.normalize_judge(judge) judge ||= {} { model: judge[:model] || Judge::DEFAULT_MODEL, vendor: (judge[:vendor] || Judge::DEFAULT_VENDOR).to_sym } end |
.rejudge_row(raw, judge_obj, store, run_id) ⇒ Object
81 82 83 84 85 86 87 |
# File 'lib/llm_conductor/eval/runner.rb', line 81 def self.rejudge_row(raw, judge_obj, store, run_id) result = restore_result(raw['model_result']) input_data = store.read_input_data(run_id, result.input_id) verdict = judge_obj.judge(model_result: result, input_data:) raw['judge_verdict'] = verdict&.to_h { model_result: result, judge_verdict: verdict } end |
.report_only(run_id:, spec:, store:) ⇒ Object
Rebuild a Report from a stored manifest without recalling models or judge.
42 43 44 45 46 |
# File 'lib/llm_conductor/eval/runner.rb', line 42 def self.report_only(run_id:, spec:, store:) manifest = store.read_manifest(run_id) or raise ArgumentError, "No manifest for run_id=#{run_id}" rows = manifest['rows'].map { |raw| restore_row(raw) } ReportBuilder.new(rows:, run_id:, judge_model: manifest['judge_model'], spec:).build end |
.restore_result(raw) ⇒ Object
68 69 70 |
# File 'lib/llm_conductor/eval/runner.rb', line 68 def self.restore_result(raw) Result.new(**raw.transform_keys(&:to_sym)) end |
.restore_row(raw) ⇒ Object
76 77 78 79 |
# File 'lib/llm_conductor/eval/runner.rb', line 76 def self.restore_row(raw) { model_result: restore_result(raw['model_result']), judge_verdict: restore_verdict(raw['judge_verdict']) } end |
.restore_verdict(raw) ⇒ Object
72 73 74 |
# File 'lib/llm_conductor/eval/runner.rb', line 72 def self.restore_verdict(raw) raw ? Verdict.new(**raw.transform_keys(&:to_sym)) : nil end |
Instance Method Details
#run ⇒ Object
30 31 32 33 34 35 36 37 38 39 |
# File 'lib/llm_conductor/eval/runner.rb', line 30 def run @logger.info("LLM eval run=#{@run_id} models=#{@models.map { |m| m[:model] }.join(',')} " \ "judge=#{@judge_config[:model]}") warn_self_judge manifest = base_manifest rows = run_all_pairs(manifest) manifest[:finished_at] = Time.now.utc.iso8601 @store.write_manifest(@run_id, manifest) build_report(rows) end |