Module: Braintrust::Eval
- Defined in:
- lib/braintrust/eval.rb,
lib/braintrust/eval/case.rb,
lib/braintrust/eval/cases.rb,
lib/braintrust/eval/trace.rb,
lib/braintrust/eval/result.rb,
lib/braintrust/eval/runner.rb,
lib/braintrust/eval/scorer.rb,
lib/braintrust/eval/context.rb,
lib/braintrust/eval/summary.rb,
lib/braintrust/eval/evaluator.rb,
lib/braintrust/eval/formatter.rb,
lib/braintrust/eval/functions.rb
Overview
Evaluation framework for testing AI systems with custom test cases and scoring functions.
The Eval module provides tools for running systematic evaluations of your AI systems. An evaluation consists of:
-
Cases: Test inputs with optional expected outputs
-
Task: The code/model being evaluated (a Task or callable)
-
Scorers: Functions that judge the quality of outputs (String name, Scorer, or callable)
Tasks and scorers use keyword arguments. Only declare the keywords you need —extra kwargs are automatically filtered out.
When using multiple scorers, each must have a unique name — scores are keyed by name, so duplicates overwrite each other. Use Scorer.new(“name”) or a Scorer subclass to assign names. Anonymous lambdas default to “scorer”.
Defined Under Namespace
Modules: Formatter, Functions, Scorer Classes: Case, Cases, Context, Evaluator, ExperimentSummary, Result, Runner, ScorerStats, Trace
Class Method Summary collapse
-
.run(task:, scorers:, project: nil, experiment: nil, cases: nil, dataset: nil, on_progress: nil, parallelism: 1, tags: nil, metadata: nil, update: false, quiet: false, state: nil, tracer_provider: nil, project_id: nil, parent: nil, parameters: nil) ⇒ Result
Run an evaluation.
-
.scorer(name, callable = nil, &block) ⇒ Object
deprecated
Deprecated.
Use Scorer.new instead
Class Method Details
.run(task:, scorers:, project: nil, experiment: nil, cases: nil, dataset: nil, on_progress: nil, parallelism: 1, tags: nil, metadata: nil, update: false, quiet: false, state: nil, tracer_provider: nil, project_id: nil, parent: nil, parameters: nil) ⇒ Result
Run an evaluation
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
# File 'lib/braintrust/eval.rb', line 180 def run(task:, scorers:, project: nil, experiment: nil, cases: nil, dataset: nil, on_progress: nil, parallelism: 1, tags: nil, metadata: nil, update: false, quiet: false, state: nil, tracer_provider: nil, project_id: nil, parent: nil, parameters: nil) # Validate required parameters validate_params!(task: task, scorers: scorers, cases: cases, dataset: dataset) experiment_id = nil project_name = project # Full API mode: project name or project_id provided, resolve via API if project || project_id state ||= Braintrust.current_state state.login if dataset resolved = resolve_dataset(dataset, project, state) cases = resolved[:cases] end # Skip experiment creation for remote evals (parent present). # The OTLP backend creates experiments from ingested spans. unless parent project_id, project_name = resolve_project(state, project, project_id) experiment_id = create_experiment( state, experiment, project_id, update: update, tags: , metadata: , dataset_id: resolved&.dig(:dataset_id), dataset_version: resolved&.dig(:dataset_version) ) parent = {object_type: "experiment_id", object_id: experiment_id} end end # Build normalized context and run context = Context.build( task: task, scorers: scorers, cases: cases, experiment_id: experiment_id, experiment_name: experiment, project_id: project_id, project_name: project_name, state: state, tracer_provider: tracer_provider, on_progress: on_progress, parent: parent, parameters: parameters ) result = Runner.new(context).run(parallelism: parallelism) # Print result summary unless quiet print_result(result) unless quiet result end |
.scorer(name, callable = nil, &block) ⇒ Object
Use Scorer.new instead
149 150 151 152 153 |
# File 'lib/braintrust/eval.rb', line 149 def scorer(name, callable = nil, &block) Log.warn_once(:eval_scorer, "Braintrust::Eval.scorer is deprecated: use Braintrust::Scorer.new instead.") block = callable.method(:call) if callable && !block Scorer.new(name, &block) end |