Class: Langfuse::ExperimentResult

Inherits:
Object
  • Object
show all
Defined in:
lib/langfuse/experiment_result.rb

Overview

Aggregate result of a full experiment run

Collects all ItemResult instances and run-level evaluations produced by Langfuse::ExperimentRunner#execute. Provides convenience accessors for successes/failures and a human-readable summary via #format.

Examples:

Inspecting results

result = client.run_experiment(name: "qa-v1", dataset_name: "qa", task: my_task)
puts result.format
result.successes.size  # => 8
result.failures.size   # => 0

Constant Summary collapse

SEPARATOR =
"\u2500" * 50

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name:, item_results:, run_evaluations: [], run_name: nil, description: nil, dataset_run_id: nil, dataset_run_url: nil) ⇒ ExperimentResult

rubocop:disable Metrics/ParameterLists

Parameters:

  • name (String)

    experiment/run name

  • item_results (Array<ItemResult>)

    per-item results

  • run_evaluations (Array<Evaluation>) (defaults to: [])

    run-level evaluations

  • run_name (String, nil) (defaults to: nil)

    auto-generated run name

  • description (String, nil) (defaults to: nil)

    run description

  • dataset_run_id (String, nil) (defaults to: nil)

    dataset run ID from the server

  • dataset_run_url (String, nil) (defaults to: nil)

    URL to the dataset run in Langfuse UI



34
35
36
37
38
39
40
41
42
43
# File 'lib/langfuse/experiment_result.rb', line 34

def initialize(name:, item_results:, run_evaluations: [], run_name: nil, description: nil,
               dataset_run_id: nil, dataset_run_url: nil)
  @name = name
  @item_results = item_results
  @run_evaluations = run_evaluations
  @run_name = run_name
  @description = description
  @dataset_run_id = dataset_run_id
  @dataset_run_url = dataset_run_url
end

Instance Attribute Details

#dataset_run_idString, ... (readonly)

Returns:

  • (String)

    the experiment/run name

  • (String, nil)

    auto-generated run name (name + timestamp)

  • (String, nil)

    run description

  • (Array<ItemResult>)

    per-item results (all items, including failures)

  • (Array<Evaluation>)

    run-level evaluation results

  • (String, nil)

    dataset run ID from the server

  • (String, nil)

    URL to the dataset run in Langfuse UI



23
24
25
# File 'lib/langfuse/experiment_result.rb', line 23

def dataset_run_id
  @dataset_run_id
end

#dataset_run_urlString, ... (readonly)

Returns:

  • (String)

    the experiment/run name

  • (String, nil)

    auto-generated run name (name + timestamp)

  • (String, nil)

    run description

  • (Array<ItemResult>)

    per-item results (all items, including failures)

  • (Array<Evaluation>)

    run-level evaluation results

  • (String, nil)

    dataset run ID from the server

  • (String, nil)

    URL to the dataset run in Langfuse UI



23
24
25
# File 'lib/langfuse/experiment_result.rb', line 23

def dataset_run_url
  @dataset_run_url
end

#descriptionString, ... (readonly)

Returns:

  • (String)

    the experiment/run name

  • (String, nil)

    auto-generated run name (name + timestamp)

  • (String, nil)

    run description

  • (Array<ItemResult>)

    per-item results (all items, including failures)

  • (Array<Evaluation>)

    run-level evaluation results

  • (String, nil)

    dataset run ID from the server

  • (String, nil)

    URL to the dataset run in Langfuse UI



23
24
25
# File 'lib/langfuse/experiment_result.rb', line 23

def description
  @description
end

#item_resultsString, ... (readonly)

Returns:

  • (String)

    the experiment/run name

  • (String, nil)

    auto-generated run name (name + timestamp)

  • (String, nil)

    run description

  • (Array<ItemResult>)

    per-item results (all items, including failures)

  • (Array<Evaluation>)

    run-level evaluation results

  • (String, nil)

    dataset run ID from the server

  • (String, nil)

    URL to the dataset run in Langfuse UI



23
24
25
# File 'lib/langfuse/experiment_result.rb', line 23

def item_results
  @item_results
end

#nameString, ... (readonly)

Returns:

  • (String)

    the experiment/run name

  • (String, nil)

    auto-generated run name (name + timestamp)

  • (String, nil)

    run description

  • (Array<ItemResult>)

    per-item results (all items, including failures)

  • (Array<Evaluation>)

    run-level evaluation results

  • (String, nil)

    dataset run ID from the server

  • (String, nil)

    URL to the dataset run in Langfuse UI



23
24
25
# File 'lib/langfuse/experiment_result.rb', line 23

def name
  @name
end

#run_evaluationsString, ... (readonly)

Returns:

  • (String)

    the experiment/run name

  • (String, nil)

    auto-generated run name (name + timestamp)

  • (String, nil)

    run description

  • (Array<ItemResult>)

    per-item results (all items, including failures)

  • (Array<Evaluation>)

    run-level evaluation results

  • (String, nil)

    dataset run ID from the server

  • (String, nil)

    URL to the dataset run in Langfuse UI



23
24
25
# File 'lib/langfuse/experiment_result.rb', line 23

def run_evaluations
  @run_evaluations
end

#run_nameString, ... (readonly)

Returns:

  • (String)

    the experiment/run name

  • (String, nil)

    auto-generated run name (name + timestamp)

  • (String, nil)

    run description

  • (Array<ItemResult>)

    per-item results (all items, including failures)

  • (Array<Evaluation>)

    run-level evaluation results

  • (String, nil)

    dataset run ID from the server

  • (String, nil)

    URL to the dataset run in Langfuse UI



23
24
25
# File 'lib/langfuse/experiment_result.rb', line 23

def run_name
  @run_name
end

Instance Method Details

#failuresArray<ItemResult>

Returns items that raised an error.

Returns:

  • (Array<ItemResult>)

    items that raised an error



50
# File 'lib/langfuse/experiment_result.rb', line 50

def failures = item_results.select(&:failed?)

#format(include_item_results: false) ⇒ String

Returns multi-line formatted report.

Parameters:

  • include_item_results (Boolean) (defaults to: false)

    whether to show per-item detail

Returns:

  • (String)

    multi-line formatted report



56
57
58
59
60
61
62
63
64
65
# File 'lib/langfuse/experiment_result.rb', line 56

def format(include_item_results: false)
  lines = []
  append_item_section(lines, include_item_results)
  lines << SEPARATOR
  append_summary(lines)
  append_evaluation_names(lines)
  append_average_scores(lines)
  append_run_evaluation_lines(lines)
  lines.join("\n")
end

#successesArray<ItemResult>

Returns items that completed without error.

Returns:

  • (Array<ItemResult>)

    items that completed without error



47
# File 'lib/langfuse/experiment_result.rb', line 47

def successes = item_results.select(&:success?)