Class: Ask::Eval::Runner

Inherits:
Object
  • Object
show all
Defined in:
lib/ask/eval/runner.rb

Overview

Runs a set of evaluation assertions and collects results. Used for batch evaluation outside of Minitest tests.

Examples:

runner = Ask::Eval::Runner.new
runner.add_test_case("My Test", "output text", context: docs)
runner.assert(:faithful, context: docs)
runner.assert(:contains, value: "hello")
results = runner.run

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(track_cost: false) ⇒ Runner

Returns a new instance of Runner.



21
22
23
24
25
# File 'lib/ask/eval/runner.rb', line 21

def initialize(track_cost: false)
  @entries = []
  @track_cost = track_cost
  @cost_tracker = CostTracker.new
end

Instance Attribute Details

#cost_trackerAsk::Eval::CostTracker (readonly)

Returns cost tracker.

Returns:



19
20
21
# File 'lib/ask/eval/runner.rb', line 19

def cost_tracker
  @cost_tracker
end

#entriesArray<Hash> (readonly)

Returns all registered test cases and their assertions.

Returns:

  • (Array<Hash>)

    all registered test cases and their assertions



16
17
18
# File 'lib/ask/eval/runner.rb', line 16

def entries
  @entries
end

Instance Method Details

#assert(name, **kwargs) ⇒ Object

Add an assertion to the last registered test case.

Parameters:

  • name (Symbol)

    assertion name (:contains, :faithful, etc.)

  • kwargs (Hash)

    additional arguments for the assertion



56
57
58
59
# File 'lib/ask/eval/runner.rb', line 56

def assert(name, **kwargs)
  raise "No test case registered. Call #test first." if @entries.empty?
  @entries.last[:assertions] << { name: name, **kwargs }
end

#reset!Object

Reset all entries.



91
92
93
94
# File 'lib/ask/eval/runner.rb', line 91

def reset!
  @entries.clear
  @cost_tracker.reset!
end

#runArray<Hash>

Run all registered evaluations.

Returns:

  • (Array<Hash>)

    results for each test case



64
65
66
67
68
69
70
71
72
73
74
75
# File 'lib/ask/eval/runner.rb', line 64

def run
  @entries.map do |entry|
    test_case = entry[:test_case]
    entry[:assertions].map do |assertion|
      name = assertion[:name]
      kwargs = assertion.reject { |k, _| k == :name }

      result = Assertions.evaluate(name, test_case.actual_output, **kwargs)
      { test: entry[:name], name: name, result: result }
    end
  end.flatten
end

#summaryHash

Returns summary of all results.

Returns:

  • (Hash)

    summary of all results



78
79
80
81
82
83
84
85
86
87
88
# File 'lib/ask/eval/runner.rb', line 78

def summary
  results = run
  passed = results.count { |r| r[:result].is_a?(Hash) ? r[:result][:passed] : r[:result].passed }
  total = results.size
  {
    total: total,
    passed: passed,
    failed: total - passed,
    results: results
  }
end

#test(name, output:, context: nil, expected: nil, input: nil) {|self| ... } ⇒ self

Register a test case with its associated assertions.

Parameters:

  • name (String)

    test case name

  • output (String)

    the LLM output to evaluate

  • context (String, Array<String>, nil) (defaults to: nil)

    source context

  • expected (String, nil) (defaults to: nil)

    expected output

  • input (String, nil) (defaults to: nil)

    input/prompt

Yields:

  • (self)

    yields the runner for adding assertions

Returns:

  • (self)


36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/ask/eval/runner.rb', line 36

def test(name, output:, context: nil, expected: nil, input: nil)
  entry = {
    name: name,
    test_case: TestCase.new(
      actual_output: output,
      context: context,
      expected_output: expected,
      input: input
    ),
    assertions: []
  }
  @entries << entry
  yield self if block_given?
  self
end