Class: AgentHarness::ProviderHealthCheck

Inherits:
Object
  • Object
show all
Defined in:
lib/agent_harness/provider_health_check.rb

Overview

Performs health checks on configured providers

Validates provider setup, authentication status, and reachability. Returns per-provider status objects with name, status, message, and latency.

Examples:

Check all providers

results = AgentHarness::ProviderHealthCheck.check_all
results.each { |r| puts "#{r[:name]}: #{r[:status]}" }

Check a single provider

result = AgentHarness::ProviderHealthCheck.check(:claude)
puts result[:status] # => "ok", "error", or "degraded"

Constant Summary collapse

DEFAULT_TIMEOUT =

Single source of truth: derive the fallback from HealthCheckConfig’s default so that the timeout isn’t duplicated here and in configuration.rb.

HealthCheckConfig.new.timeout

Class Method Summary collapse

Class Method Details

.check(provider_name, timeout: configured_timeout, executor: nil, provider_runtime: nil) ⇒ Hash

Check health of a single provider

Parameters:

  • provider_name (Symbol, String)

    the provider name

  • timeout (Integer) (defaults to: configured_timeout)

    timeout in seconds

Returns:

  • (Hash)

    health status with :name, :status, :message, :latency_ms keys



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/agent_harness/provider_health_check.rb', line 47

def check(provider_name, timeout: configured_timeout, executor: nil, provider_runtime: nil)
  name = normalize_name(provider_name)
  start_time = monotonic_now
  timeout = validate_timeout(timeout)

  # Honor the provider smoke-test contract timeout when it exceeds
  # the health-check timeout, so real CLI round trips are not
  # falsely reported as timeouts.
  outer_timeout = effective_check_timeout(name, timeout)

  Timeout.timeout(outer_timeout) do
    perform_check(
      name,
      start_time,
      timeout: timeout,
      executor: executor,
      provider_runtime: provider_runtime
    )
  end
rescue Timeout::Error
  build_result(
    name: name,
    status: "error",
    message: "Health check timed out after #{outer_timeout || timeout}s",
    start_time: start_time || monotonic_now,
    error_category: :timeout,
    check: :timeout
  )
rescue NotImplementedError, ConfigurationError => e
  # NotImplementedError inherits from ScriptError, not StandardError,
  # so it must be rescued explicitly. Its messages are safe internal
  # setup errors (e.g., missing provider methods or malformed provider
  # contracts) that help users diagnose configuration problems.
  AgentHarness.logger&.error("ProviderHealthCheck error for #{name}: #{e.class}")
  build_result(
    name: name,
    status: "error",
    message: "Health check failed: #{e.class}: #{e.message}",
    start_time: start_time || monotonic_now,
    error_category: :configuration,
    check: :provider_health
  )
rescue => e
  # Return a generic message to avoid leaking sensitive details
  # (e.g., tokens embedded in exception messages). Log only the
  # exception class (not the message) to avoid leaking secrets.
  AgentHarness.logger&.error("ProviderHealthCheck error for #{name}: #{e.class}")
  build_result(
    name: name,
    status: "error",
    message: "Health check failed: #{e.class}",
    start_time: start_time || monotonic_now,
    error_category: :unknown,
    check: :provider_health
  )
end

.check_all(timeout: configured_timeout, executor: nil, provider_runtime: nil) ⇒ Array<Hash>

Check health of all configured providers

Parameters:

  • timeout (Integer) (defaults to: configured_timeout)

    timeout in seconds for each check

Returns:

  • (Array<Hash>)

    health status for each provider

Raises:

  • (ArgumentError)


28
29
30
31
32
33
34
35
36
37
38
39
40
# File 'lib/agent_harness/provider_health_check.rb', line 28

def check_all(timeout: configured_timeout, executor: nil, provider_runtime: nil)
  raise ArgumentError, "provider_runtime is only supported for single-provider health checks" unless provider_runtime.nil?

  provider_names = if AgentHarness.configuration.providers.empty?
    Providers::Registry.instance.all
  else
    enabled_provider_names
  end

  provider_names.map do |name|
    check(name, timeout: timeout, executor: executor, provider_runtime: provider_runtime)
  end
end

.format_results(results) ⇒ String

Format health check results for CLI output

Parameters:

  • results (Array<Hash>)

    health check results

Returns:

  • (String)

    formatted output



108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/agent_harness/provider_health_check.rb', line 108

def format_results(results)
  lines = ["Checking providers..."]

  if results.empty?
    lines << ""
    lines << "No providers checked."
    return lines.join("\n")
  end

  results.each do |result|
    name = result[:name].to_s.ljust(16)
    case result[:status]
    when "ok"
      latency = result[:latency_ms] ? "(#{result[:latency_ms]}ms)" : ""
      lines << "#{name} OK #{latency}".rstrip
    when "degraded"
      lines << "  ~ #{name} #{result[:message]}"
    else
      lines << "#{name} #{result[:message]}"
    end
  end

  failed = results.count { |r| r[:status] == "error" }
  degraded = results.count { |r| r[:status] == "degraded" }
  total = results.size

  lines << ""
  summary_parts = []
  summary_parts << "#{failed} failed" if failed > 0
  summary_parts << "#{degraded} degraded" if degraded > 0

  provider_word = (total == 1) ? "provider" : "providers"
  lines << if summary_parts.any?
    "#{total} #{provider_word} checked: #{summary_parts.join(", ")}."
  else
    "All #{total} #{provider_word} healthy."
  end

  lines.join("\n")
end