Class: Phronomy::Eval::Runner

Inherits:
Object
  • Object
show all
Defined in:
lib/phronomy/eval/runner.rb

Overview

Runs a Dataset through a callable and collects EvalResult objects.

The callable must respond to +#call(input)+ and may return either:

  • a plain +String+ — treated as the output; usage is nil
  • a +Hash+ with +:output+ and optional +:usage+ (TokenUsage) keys

Examples:

With a simple proc

runner  = Runner.new(scorer: Scorer::ExactMatch.new)
dataset = Dataset.from_array([{ input: "2+2", expected: "4" }])
results = runner.run(dataset, ->(input) { "4" })

With a Phronomy agent

agent   = MyAgent.new
results = runner.run(dataset, ->(input) { agent.invoke(input) })

Instance Method Summary collapse

Constructor Details

#initialize(scorer: Scorer::ExactMatch.new) ⇒ Runner

Returns a new instance of Runner.

Parameters:

  • scorer (Scorer::Base) (defaults to: Scorer::ExactMatch.new)

    scorer used to evaluate each result



21
22
23
# File 'lib/phronomy/eval/runner.rb', line 21

def initialize(scorer: Scorer::ExactMatch.new)
  @scorer = scorer
end

Instance Method Details

#run(dataset, callable, concurrency: 1) ⇒ Array<EvalResult>

Parameters:

  • dataset (Dataset)

    collection of EvalCase objects

  • callable (#call)

    accepts a single String argument

  • concurrency (Integer) (defaults to: 1)

    number of parallel threads (default: 1, sequential)

Returns:



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# File 'lib/phronomy/eval/runner.rb', line 29

def run(dataset, callable, concurrency: 1)
  cases = dataset.to_a
  return cases.map { |eval_case| run_one(eval_case, callable) } if concurrency <= 1

  # Run cases in slices of +concurrency+ threads. Each slice is joined
  # before the next starts, bounding peak thread count to +concurrency+.
  # Writing to pre-allocated slots (one per thread) is safe because each
  # thread writes to a unique index and all threads in a slice are joined
  # before the next slice begins.
  # Exceptions in worker threads are collected and re-raised after all
  # threads in the slice are joined, preventing orphaned threads.
  results = Array.new(cases.length)
  cases.each_with_index.each_slice(concurrency) do |batch|
    errors = []
    errors_mu = Mutex.new
    threads = batch.map do |eval_case, i|
      Thread.new do
        results[i] = run_one(eval_case, callable)
      rescue => e
        errors_mu.synchronize { errors << e }
      end
    end
    threads.each(&:join)
    raise errors.first if errors.any?
  end
  results
end