Class: Phronomy::Eval::Runner

Inherits:
Object
  • Object
show all
Defined in:
lib/phronomy/eval/runner.rb

Overview

Runs a Dataset through a callable and collects EvalResult objects.

The callable must respond to +#call(input)+ and may return either:

  • a plain +String+ — treated as the output; usage is nil
  • a +Hash+ with +:output+ and optional +:usage+ (TokenUsage) keys

Examples:

With a simple proc

runner  = Runner.new(scorer: Scorer::ExactMatch.new)
dataset = Dataset.from_array([{ input: "2+2", expected: "4" }])
results = runner.run(dataset, ->(input) { "4" })

With a Phronomy agent

agent   = MyAgent.new
results = runner.run(dataset, ->(input) { agent.invoke(input) })

Instance Method Summary collapse

Constructor Details

#initialize(scorer: Scorer::ExactMatch.new) ⇒ Runner

Returns a new instance of Runner.

Parameters:

  • scorer (Scorer::Base) (defaults to: Scorer::ExactMatch.new)

    scorer used to evaluate each result



21
22
23
# File 'lib/phronomy/eval/runner.rb', line 21

def initialize(scorer: Scorer::ExactMatch.new)
  @scorer = scorer
end

Instance Method Details

#run(dataset, callable) ⇒ Array<EvalResult>

Parameters:

  • dataset (Dataset)

    collection of EvalCase objects

  • callable (#call)

    accepts a single String argument

Returns:



28
29
30
31
32
33
34
35
36
37
38
39
# File 'lib/phronomy/eval/runner.rb', line 28

def run(dataset, callable)
  dataset.map do |eval_case|
    t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC, :millisecond)
    result = callable.call(eval_case.input)
    latency_ms = Process.clock_gettime(Process::CLOCK_MONOTONIC, :millisecond) - t0

    actual, usage = extract(result)
    score = @scorer.score(actual: actual, expected: eval_case.expected, input: eval_case.input)

    EvalResult.new(eval_case: eval_case, actual: actual, score: score, usage: usage, latency_ms: latency_ms)
  end
end