Module: HTM::Observability
- Defined in:
- lib/htm/observability.rb
Overview
Observability module for monitoring and metrics collection
Provides comprehensive monitoring of HTM components including:
-
Connection pool health monitoring with alerts
-
Query timing and performance metrics
-
Cache efficiency tracking
-
Service health checks
-
Memory usage statistics
Constant Summary collapse
- POOL_WARNING_THRESHOLD =
Connection pool utilization thresholds
0.75- POOL_CRITICAL_THRESHOLD =
75% utilization triggers warning
0.90
Class Method Summary collapse
-
.cache_stats ⇒ Hash?
Get query cache statistics.
-
.circuit_breaker_stats ⇒ Hash
Get circuit breaker states for all services.
-
.collect_all ⇒ Hash
Collect all observability metrics.
-
.connection_pool_stats ⇒ Hash
Get connection pool statistics with health status.
-
.health_check ⇒ Hash
Perform comprehensive health check.
-
.healthy? ⇒ Boolean
Quick health check - returns boolean.
-
.memory_stats ⇒ Hash
Get memory usage statistics.
-
.query_timing_stats ⇒ Hash
Get query timing statistics.
-
.record_embedding_timing(duration_ms) ⇒ Object
Record embedding generation timing.
-
.record_query_timing(duration_ms, query_type: :unknown) ⇒ Object
Record query timing for metrics.
-
.record_tag_timing(duration_ms) ⇒ Object
Record tag extraction timing.
-
.reset_metrics! ⇒ void
Clear all collected timing metrics.
-
.service_timing_stats ⇒ Hash
Get service timing statistics (embedding and tag extraction).
Class Method Details
.cache_stats ⇒ Hash?
Get query cache statistics
127 128 129 130 131 132 133 |
# File 'lib/htm/observability.rb', line 127 def cache_stats # Try to access LongTermMemory cache stats # Note: This requires access to an LTM instance { info: "Cache stats available via LongTermMemory#stats[:cache]" } end |
.circuit_breaker_stats ⇒ Hash
Get circuit breaker states for all services
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/htm/observability.rb', line 141 def circuit_breaker_stats stats = {} if defined?(HTM::EmbeddingService) cb = HTM::EmbeddingService.circuit_breaker stats[:embedding_service] = { state: cb.state, failure_count: cb.failure_count, last_failure_time: cb.last_failure_time } end if defined?(HTM::TagService) cb = HTM::TagService.circuit_breaker stats[:tag_service] = { state: cb.state, failure_count: cb.failure_count, last_failure_time: cb.last_failure_time } end stats rescue StandardError => e { error: e. } end |
.collect_all ⇒ Hash
Collect all observability metrics
53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/htm/observability.rb', line 53 def collect_all { connection_pool: connection_pool_stats, cache: cache_stats, circuit_breakers: circuit_breaker_stats, query_timings: query_timing_stats, service_timings: service_timing_stats, memory_usage: memory_stats, collected_at: Time.now } end |
.connection_pool_stats ⇒ Hash
Get connection pool statistics with health status
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/htm/observability.rb', line 76 def connection_pool_stats return { status: :unavailable, message: "Database not connected" } unless connected? pool = HTM.db.pool # Sequel's TimedQueueConnectionPool API: # - max_size: maximum pool size # - size: number of connections currently in pool (pre-allocated up to max_size) # - num_waiting: threads waiting for connections # # Unlike ActiveRecord, Sequel pre-allocates connections. The key health indicator # is num_waiting - if threads are waiting, the pool is under stress. max_size = pool.max_size current_size = pool.size waiting = pool.num_waiting # For Sequel's TimedQueueConnectionPool: # - Pool is healthy if no threads are waiting # - Pool is critical only if threads are waiting for connections # - size == max_size is normal (pre-allocated pool), not a problem status = if waiting.positive? waiting > max_size / 2 ? :exhausted : :critical else :healthy end # Utilization based on waiting threads (pool stress indicator) utilization = waiting.positive? ? ((waiting.to_f / max_size) * 100).round(2) : 0.0 stats = { size: max_size, connections: current_size, in_use: 0, # Sequel doesn't expose checked-out count; use waiting as stress indicator available: max_size, # All connections are available when not waiting waiting: waiting, utilization: utilization, status: status } # Log warnings if pool is stressed log_pool_status(stats) stats rescue StandardError => e { status: :error, message: e. } end |
.health_check ⇒ Hash
Perform comprehensive health check
252 253 254 255 256 257 258 259 260 |
# File 'lib/htm/observability.rb', line 252 def health_check checks = {} issues = [] check_database(checks, issues) check_pool(checks, issues) check_circuit_breakers(checks, issues) check_extensions(checks, issues) if connected? { healthy: issues.empty?, checks: checks, issues: issues, checked_at: Time.now } end |
.healthy? ⇒ Boolean
Quick health check - returns boolean
266 267 268 |
# File 'lib/htm/observability.rb', line 266 def healthy? health_check[:healthy] end |
.memory_stats ⇒ Hash
Get memory usage statistics
236 237 238 239 240 241 242 243 |
# File 'lib/htm/observability.rb', line 236 def memory_stats { process_rss_mb: process_memory_mb, gc_stats: GC.stat.slice(:count, :heap_allocated_pages, :heap_live_slots) } rescue StandardError { available: false } end |
.query_timing_stats ⇒ Hash
Get query timing statistics
217 218 219 |
# File 'lib/htm/observability.rb', line 217 def query_timing_stats calculate_timing_stats(@query_timings, :query) end |
.record_embedding_timing(duration_ms) ⇒ Object
Record embedding generation timing
189 190 191 192 193 194 195 196 197 |
# File 'lib/htm/observability.rb', line 189 def (duration_ms) @metrics_mutex.synchronize do @embedding_timings << { duration_ms: duration_ms, recorded_at: Time.now } @embedding_timings.shift if @embedding_timings.size > @max_timing_samples end end |
.record_query_timing(duration_ms, query_type: :unknown) ⇒ Object
Record query timing for metrics
172 173 174 175 176 177 178 179 180 181 182 183 |
# File 'lib/htm/observability.rb', line 172 def record_query_timing(duration_ms, query_type: :unknown) @metrics_mutex.synchronize do @query_timings << { duration_ms: duration_ms, query_type: query_type, recorded_at: Time.now } # Keep only recent samples @query_timings.shift if @query_timings.size > @max_timing_samples end end |
.record_tag_timing(duration_ms) ⇒ Object
Record tag extraction timing
203 204 205 206 207 208 209 210 211 |
# File 'lib/htm/observability.rb', line 203 def record_tag_timing(duration_ms) @metrics_mutex.synchronize do @tag_extraction_timings << { duration_ms: duration_ms, recorded_at: Time.now } @tag_extraction_timings.shift if @tag_extraction_timings.size > @max_timing_samples end end |
.reset_metrics! ⇒ void
This method returns an undefined value.
Clear all collected timing metrics
274 275 276 277 278 279 280 |
# File 'lib/htm/observability.rb', line 274 def reset_metrics! @metrics_mutex.synchronize do @query_timings.clear @embedding_timings.clear @tag_extraction_timings.clear end end |
.service_timing_stats ⇒ Hash
Get service timing statistics (embedding and tag extraction)
225 226 227 228 229 230 |
# File 'lib/htm/observability.rb', line 225 def service_timing_stats { embedding: calculate_timing_stats(@embedding_timings, :embedding), tag_extraction: calculate_timing_stats(@tag_extraction_timings, :tag) } end |