Class: RubynCode::Goal::Evaluator

Inherits:
Object
  • Object
show all
Defined in:
lib/rubyn_code/goal/evaluator.rb

Overview

Judges whether a goal condition has been satisfied.

The evaluator is deliberately conservative: it returns true only when the model is confident the goal is genuinely complete. Any error or ambiguous answer is treated as “not met” so the agent keeps working rather than stopping prematurely.

Constant Summary collapse

SYSTEM_PROMPT =
<<~PROMPT
  You are a strict completion judge. Given a GOAL and a transcript of an
  AI coding agent's recent work, decide whether the goal is genuinely and
  fully satisfied. Be conservative: if there is any doubt, or the work is
  only partially done, answer NO. Answer with exactly one word on the
  first line: YES or NO. Optionally add a short reason on the next line.
PROMPT
TRANSCRIPT_WINDOW =

Number of trailing conversation messages to show the judge.

12

Instance Method Summary collapse

Constructor Details

#initialize(llm_client:) ⇒ Evaluator

Returns a new instance of Evaluator.

Parameters:



28
29
30
# File 'lib/rubyn_code/goal/evaluator.rb', line 28

def initialize(llm_client:)
  @llm_client = llm_client
end

Instance Method Details

#call(condition:, conversation: nil) ⇒ Boolean

Returns true only when the goal is confidently complete.

Parameters:

  • condition (String)

    the goal condition

  • conversation (Agent::Conversation, nil) (defaults to: nil)

    recent work to judge

Returns:

  • (Boolean)

    true only when the goal is confidently complete



35
36
37
38
39
40
41
42
43
44
# File 'lib/rubyn_code/goal/evaluator.rb', line 35

def call(condition:, conversation: nil)
  response = @llm_client.chat(
    messages: [{ role: 'user', content: prompt(condition, conversation) }],
    system: SYSTEM_PROMPT
  )
  verdict_yes?(answer_text(response))
rescue StandardError => e
  RubynCode::Debug.warn("Goal evaluation failed: #{e.message}")
  false
end