Class: SkillBench::HistoryRecorder::SummaryService

Inherits:
Object
  • Object
show all
Defined in:
lib/skill_bench/history_recorder/summary_service.rb

Overview

Service object for summarizing evaluation results. Handles score normalization and statistical calculations. Follows Single Responsibility Principle by isolating summary concerns.

Class Method Summary collapse

Class Method Details

.calculate_summary(scores) ⇒ Hash

Calculates statistical summary from a list of normalized scores.

Parameters:

  • scores (Array<Hash>)

    List of normalized scores.

Returns:

  • (Hash)

    Summary statistics.



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/skill_bench/history_recorder/summary_service.rb', line 42

def self.calculate_summary(scores)
  count = scores.size
  baseline_total = 0.0
  context_total = 0.0

  scores.each do |score|
    baseline_total += (score[:baseline_score] || 0).to_f
    context_total += (score[:context_score] || 0).to_f
  end

  {
    task_count: count,
    average_baseline: (baseline_total / count).round(2),
    average_context: (context_total / count).round(2),
    improvement: ((context_total - baseline_total) / count).round(2)
  }
end

.normalize_score(raw_score) ⇒ Hash

Normalizes the raw judge score into a standardized Hash.

Parameters:

  • raw_score (String, Hash, nil)

    The raw score from the judge.

Returns:

  • (Hash)

    The normalized score with :baseline_score and :context_score.

Raises:

  • (JSON::ParserError)

    raised when the judge_score string contains invalid JSON (rescued internally).



27
28
29
30
31
32
33
34
35
36
# File 'lib/skill_bench/history_recorder/summary_service.rb', line 27

def self.normalize_score(raw_score)
  return {} unless raw_score
  return raw_score if raw_score.is_a?(Hash)

  begin
    JSON.parse(raw_score, symbolize_names: true)
  rescue JSON::ParserError
    {}
  end
end

.summarize(tasks) ⇒ Hash

Summarizes the results of multiple tasks.

Parameters:

  • tasks (Array<Hash>)

    The list of task results.

Returns:

  • (Hash)

    A summary of scores including averages and improvement.



15
16
17
18
19
20
# File 'lib/skill_bench/history_recorder/summary_service.rb', line 15

def self.summarize(tasks)
  return {} if Array(tasks).empty?

  scores = tasks.map { |task| normalize_score(task[:judge_score]) }
  calculate_summary(scores)
end