Class: SkillBench::Task::Evaluator Deprecated
- Inherits:
-
Object
- Object
- SkillBench::Task::Evaluator
- Defined in:
- lib/skill_bench/task/evaluator.rb
Overview
Deprecated.
Use Evaluation::Runner instead.
Evaluates a single task by running baseline and context-hydrated evaluations. Orchestrates Agent::Runner calls and Judge::Judge scoring.
Class Method Summary collapse
-
.call(full_eval_path:, base_path:, skill_path: nil, client_params: {}) ⇒ Hash
Evaluates a single task.
Instance Method Summary collapse
-
#call ⇒ Hash
Executes the task evaluation.
-
#initialize(full_eval_path:, base_path:, skill_path:, client_params:) ⇒ Evaluator
constructor
A new instance of Evaluator.
Constructor Details
#initialize(full_eval_path:, base_path:, skill_path:, client_params:) ⇒ Evaluator
Returns a new instance of Evaluator.
33 34 35 36 37 38 |
# File 'lib/skill_bench/task/evaluator.rb', line 33 def initialize(full_eval_path:, base_path:, skill_path:, client_params:) @full_eval_path = full_eval_path @base_path = base_path @skill_path = skill_path @client_params = client_params end |
Class Method Details
.call(full_eval_path:, base_path:, skill_path: nil, client_params: {}) ⇒ Hash
Evaluates a single task.
25 26 27 |
# File 'lib/skill_bench/task/evaluator.rb', line 25 def self.call(full_eval_path:, base_path:, skill_path: nil, client_params: {}) new(full_eval_path:, base_path:, skill_path:, client_params:).call end |
Instance Method Details
#call ⇒ Hash
Executes the task evaluation.
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
# File 'lib/skill_bench/task/evaluator.rb', line 43 def call relative_path = @full_eval_path.relative_path_from(@base_path) relative_path_str = relative_path.to_s files_result = FileReader.call(@full_eval_path) return files_result unless files_result[:success] files_response = files_result[:response] task_content = files_response[:task] criteria_content = files_response[:criteria] source_path = Execution::SourcePathResolver.call( eval_folder_path: relative_path_str, skill_path: @skill_path ) return { success: false, response: { error: { message: 'No source path inferred' } } } unless source_path baseline_result, baseline_code_diff = Agent::Runner.call( mode: :baseline, full_eval_path: @full_eval_path, task_content: task_content, client_params: @client_params ) context_result, context_code_diff = Agent::Runner.call( mode: :context, full_eval_path: @full_eval_path, task_content: task_content, client_params: @client_params, source_path: source_path, base_path: @base_path ) judge_score = Judge::Judge.call(task_content, criteria_content, baseline_code_diff, context_code_diff, @client_params) return judge_score unless judge_score[:success] { path: relative_path_str, baseline: baseline_result, baseline_diff: baseline_code_diff, with_context: context_result, context_diff: context_code_diff, judge_score: judge_score } rescue StandardError => e SkillBench::ErrorLogger.log_error(e, 'Task::Evaluator Error') { success: false, response: { error: { message: "Error evaluating task: #{e.}" } } } end |