Class: SkillBench::Task::Evaluator Deprecated

Inherits:
Object
  • Object
show all
Defined in:
lib/skill_bench/task/evaluator.rb

Overview

Deprecated.

Use Evaluation::Runner instead.

Evaluates a single task by running baseline and context-hydrated evaluations. Orchestrates Agent::Runner calls and Judge::Judge scoring.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(full_eval_path:, base_path:, skill_path:, client_params:) ⇒ Evaluator

Returns a new instance of Evaluator.

Parameters:

  • full_eval_path (Pathname)

    The path to the evaluation directory.

  • base_path (Pathname)

    The base path for relative file resolution.

  • skill_path (String, nil)

    Optional override for the source directory.

  • client_params (Hash)

    Parameters to pass to the LLM client.



33
34
35
36
37
38
# File 'lib/skill_bench/task/evaluator.rb', line 33

def initialize(full_eval_path:, base_path:, skill_path:, client_params:)
  @full_eval_path = full_eval_path
  @base_path = base_path
  @skill_path = skill_path
  @client_params = client_params
end

Class Method Details

.call(full_eval_path:, base_path:, skill_path: nil, client_params: {}) ⇒ Hash

Evaluates a single task.

Parameters:

  • full_eval_path (Pathname)

    The path to the evaluation directory.

  • base_path (Pathname)

    The base path for relative file resolution.

  • skill_path (String, nil) (defaults to: nil)

    Optional override for the source directory.

  • client_params (Hash) (defaults to: {})

    Parameters to pass to the LLM client.

Returns:

  • (Hash)

    The result of the task evaluation.



25
26
27
# File 'lib/skill_bench/task/evaluator.rb', line 25

def self.call(full_eval_path:, base_path:, skill_path: nil, client_params: {})
  new(full_eval_path:, base_path:, skill_path:, client_params:).call
end

Instance Method Details

#callHash

Executes the task evaluation.

Returns:

  • (Hash)

    The result of the task evaluation.



43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# File 'lib/skill_bench/task/evaluator.rb', line 43

def call
  relative_path = @full_eval_path.relative_path_from(@base_path)
  relative_path_str = relative_path.to_s

  files_result = FileReader.call(@full_eval_path)
  return files_result unless files_result[:success]

  files_response = files_result[:response]
  task_content = files_response[:task]
  criteria_content = files_response[:criteria]

  source_path = Execution::SourcePathResolver.call(
    eval_folder_path: relative_path_str,
    skill_path: @skill_path
  )

  return { success: false, response: { error: { message: 'No source path inferred' } } } unless source_path

  baseline_result, baseline_code_diff = Agent::Runner.call(
    mode: :baseline,
    full_eval_path: @full_eval_path,
    task_content: task_content,
    client_params: @client_params
  )

  context_result, context_code_diff = Agent::Runner.call(
    mode: :context,
    full_eval_path: @full_eval_path,
    task_content: task_content,
    client_params: @client_params,
    source_path: source_path,
    base_path: @base_path
  )

  judge_score = Judge::Judge.call(task_content, criteria_content, baseline_code_diff, context_code_diff, @client_params)
  return judge_score unless judge_score[:success]

  {
    path: relative_path_str,
    baseline: baseline_result,
    baseline_diff: baseline_code_diff,
    with_context: context_result,
    context_diff: context_code_diff,
    judge_score: judge_score
  }
rescue StandardError => e
  SkillBench::ErrorLogger.log_error(e, 'Task::Evaluator Error')
  { success: false, response: { error: { message: "Error evaluating task: #{e.message}" } } }
end