Module: Tracelit::Metrics
- Defined in:
- lib/tracelit/metrics.rb
Class Method Summary collapse
-
.counter(name, description: "", unit: "") ⇒ Object
Exposes a counter for manual instrumentation in user code: Tracelit::Metrics.counter(“orders.placed”).add(1).
- .gauge(name, description: "", unit: "") ⇒ Object
- .histogram(name, description: "", unit: "") ⇒ Object
-
.install_connection_pool_poller ⇒ Object
Polls ActiveRecord connection pool stats every 30 seconds on a daemon thread and records them as gauges.
-
.install_cpu_poller ⇒ Object
Polls process CPU utilisation every 30 seconds on a daemon thread.
-
.install_memory_poller ⇒ Object
Polls process RSS memory every 60 seconds on a daemon thread.
-
.install_rails_subscriber ⇒ Object
Subscribes to Rails process_action.action_controller to emit: http.server.request.count — counter per request http.server.request.duration — histogram in milliseconds http.server.error.count — counter for 5xx responses db.query.duration — histogram for ActiveRecord time per request.
-
.install_sidekiq_middleware ⇒ Object
Installs a Sidekiq server middleware that emits per-job metrics.
- .meter ⇒ Object
-
.read_cpu_time_s(pid, linux) ⇒ Object
Returns cumulative CPU time (user + system) for this process in seconds.
-
.restart_pollers(config) ⇒ Object
Fix 5 (support): Called from the Process._fork hook in Instrumentation to restart background polling threads inside each forked Puma/Unicorn worker.
-
.setup(config) ⇒ Object
Sets up the OpenTelemetry MeterProvider with OTLP exporter.
Class Method Details
.counter(name, description: "", unit: "") ⇒ Object
76 77 78 79 80 81 |
# File 'lib/tracelit/metrics.rb', line 76 def self.counter(name, description: "", unit: "") @meter&.create_counter(name, description: description, unit: unit ) end |
.gauge(name, description: "", unit: "") ⇒ Object
90 91 92 93 94 95 |
# File 'lib/tracelit/metrics.rb', line 90 def self.gauge(name, description: "", unit: "") @meter&.create_gauge(name, description: description, unit: unit ) end |
.histogram(name, description: "", unit: "") ⇒ Object
83 84 85 86 87 88 |
# File 'lib/tracelit/metrics.rb', line 83 def self.histogram(name, description: "", unit: "") @meter&.create_histogram(name, description: description, unit: unit ) end |
.install_connection_pool_poller ⇒ Object
Polls ActiveRecord connection pool stats every 30 seconds on a daemon thread and records them as gauges. Does not require a live connection at install time — errors during polling are silently retried next cycle.
Fix 11: version-safe pool access that works on Rails 6.0–8.x.
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 |
# File 'lib/tracelit/metrics.rb', line 235 def self.install_connection_pool_poller return if @connection_pool_poller_installed @connection_pool_poller_installed = true pool_size = @meter.create_gauge( "db.connection_pool.size", description: "Maximum connections in the pool", unit: "{connections}" ) pool_busy = @meter.create_gauge( "db.connection_pool.busy", description: "Connections currently checked out", unit: "{connections}" ) pool_idle = @meter.create_gauge( "db.connection_pool.idle", description: "Connections available for checkout", unit: "{connections}" ) pool_waiting = @meter.create_gauge( "db.connection_pool.waiting", description: "Threads waiting for a connection", unit: "{threads}" ) thread = Thread.new do Thread.current[:tracelit_pool_poller] = true loop do sleep 30 begin # Fix 11a: Rails 7.2 soft-deprecated connection_pool on the base # class. Use the connection handler when available; fall back for # Rails 6.0 compatibility. pool = if ActiveRecord::Base.respond_to?(:connection_handler) ActiveRecord::Base.connection_handler .retrieve_connection_pool(ActiveRecord::Base.connection_specification_name) rescue ActiveRecord::Base.connection_pool else ActiveRecord::Base.connection_pool end next unless pool stat = pool.stat # Fix 11b: pool_config.db_config was added in Rails 6.1. # Fall back gracefully on older setups. adapter = if pool.respond_to?(:pool_config) pool.pool_config.db_config.adapter.to_s rescue "unknown" else "unknown" end attrs = { "db.system" => adapter } pool_size.record(stat[:size], attributes: attrs) pool_busy.record(stat[:busy], attributes: attrs) pool_idle.record(stat[:idle], attributes: attrs) pool_waiting.record(stat[:waiting], attributes: attrs) rescue StandardError # Pool may not be connected yet — retry next cycle end end end thread.abort_on_exception = false thread rescue StandardError => e OpenTelemetry.logger.warn("[Tracelit] failed to install connection pool poller: #{e.}") end |
.install_cpu_poller ⇒ Object
Polls process CPU utilisation every 30 seconds on a daemon thread. Computes a percentage by tracking the delta in CPU time (user + system) against wall-clock elapsed time — same approach as the Go and Node SDKs.
On Linux: reads /proc/self/stat (utime + stime in jiffies at 100 Hz). On macOS: reads ‘ps -o %cpu= -p <pid>` as a direct percentage.
Emits: process.runtime.cpu.usage (%) Attributes: process.pid, process.runtime
365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 |
# File 'lib/tracelit/metrics.rb', line 365 def self.install_cpu_poller return if @cpu_poller_installed @cpu_poller_installed = true cpu_gauge = @meter.create_gauge( "process.runtime.cpu.usage", description: "Process CPU utilisation percentage", unit: "%" ) pid = Process.pid linux = File.exist?("/proc/self/stat") interval = 30 # seconds thread = Thread.new do Thread.current[:tracelit_cpu_poller] = true last_cpu_time = read_cpu_time_s(pid, linux) last_wall_time = Process.clock_gettime(Process::CLOCK_MONOTONIC) loop do sleep interval begin now = Process.clock_gettime(Process::CLOCK_MONOTONIC) elapsed = now - last_wall_time cpu_time = read_cpu_time_s(pid, linux) next if elapsed <= 0 || cpu_time.nil? || last_cpu_time.nil? delta = cpu_time - last_cpu_time last_cpu_time = cpu_time last_wall_time = now next if delta < 0 pct = [[delta / elapsed * 100.0, 100.0].min, 0.0].max cpu_gauge.record(pct, attributes: { "process.pid" => pid.to_s, "process.runtime" => "ruby", }) rescue StandardError # Retry next cycle — never crash on a metric poll failure end end end thread.abort_on_exception = false thread rescue StandardError => e OpenTelemetry.logger.warn("[Tracelit] failed to install CPU poller: #{e.}") end |
.install_memory_poller ⇒ Object
Polls process RSS memory every 60 seconds on a daemon thread.
Fix 12: On Linux use /proc/self/status (always present, no subprocess). Fall back to ‘ps` on macOS/BSD. The previous implementation always used a shell backtick which spawns a child process and fails silently in minimal Docker containers that lack procps.
313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 |
# File 'lib/tracelit/metrics.rb', line 313 def self.install_memory_poller return if @memory_poller_installed @memory_poller_installed = true memory_gauge = @meter.create_gauge( "process.memory.rss", description: "Process resident set size (RSS)", unit: "MB" ) pid = Process.pid thread = Thread.new do Thread.current[:tracelit_memory_poller] = true loop do sleep 60 begin rss_kb = if File.exist?("/proc/self/status") # Linux: read VmRSS from /proc — no subprocess, always available File.read("/proc/self/status")[/VmRSS:\s+(\d+)/, 1].to_i else # macOS / BSD fallback `ps -o rss= -p #{Integer(pid)} 2>/dev/null`.strip.to_i end next if rss_kb == 0 rss_mb = rss_kb / 1024.0 memory_gauge.record(rss_mb, attributes: { "process.pid" => pid.to_s, "process.runtime" => "ruby", }) rescue StandardError # Ignore — environment may not support RSS polling end end end thread.abort_on_exception = false thread rescue StandardError => e OpenTelemetry.logger.warn("[Tracelit] failed to install memory poller: #{e.}") end |
.install_rails_subscriber ⇒ Object
Subscribes to Rails process_action.action_controller to emit:
http.server.request.count — counter per request
http.server.request.duration — histogram in milliseconds
http.server.error.count — counter for 5xx responses
db.query.duration — histogram for ActiveRecord time per request
Fix 6: guarded against double-registration so reset! + re-setup in tests or Rails code-reloading scenarios does not duplicate metric counts.
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
# File 'lib/tracelit/metrics.rb', line 105 def self.install_rails_subscriber return if @rails_subscriber_installed @rails_subscriber_installed = true request_counter = @meter.create_counter( "http.server.request.count", description: "Total HTTP requests processed", unit: "{requests}" ) duration_histogram = @meter.create_histogram( "http.server.request.duration", description: "HTTP request duration", unit: "ms" ) error_counter = @meter.create_counter( "http.server.error.count", description: "Total HTTP 5xx responses", unit: "{errors}" ) db_duration_histogram = @meter.create_histogram( "db.query.duration", description: "Database query duration", unit: "ms" ) ActiveSupport::Notifications.subscribe("process_action.action_controller") do |*args| event = ActiveSupport::Notifications::Event.new(*args) payload = event.payload attrs = { # Fix 7: use controller#action (stable, low-cardinality route template) # instead of payload[:path] which contains raw IDs and causes metric # cardinality explosion on apps with resource IDs in URLs. "http.route" => "#{payload[:controller]}##{payload[:action]}", "http.method" => payload[:method].to_s, "http.status_code" => payload[:status].to_s, "controller" => payload[:controller].to_s, "action" => payload[:action].to_s, } request_counter.add(1, attributes: attrs) duration_histogram.record(event.duration, attributes: attrs) error_counter.add(1, attributes: attrs) if payload[:status].to_i >= 500 if payload[:db_runtime] db_duration_histogram.record( payload[:db_runtime].to_f, attributes: { "controller" => payload[:controller].to_s } ) end rescue StandardError # Never let metric errors surface to the application end end |
.install_sidekiq_middleware ⇒ Object
Installs a Sidekiq server middleware that emits per-job metrics. Uses a dynamically defined class so the instrument references are captured in the closure without global state.
Fix 6: guarded against double-registration.
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
# File 'lib/tracelit/metrics.rb', line 169 def self.install_sidekiq_middleware return if @sidekiq_middleware_installed @sidekiq_middleware_installed = true job_counter = @meter.create_counter( "sidekiq.job.count", description: "Total Sidekiq jobs processed", unit: "{jobs}" ) job_duration = @meter.create_histogram( "sidekiq.job.duration", description: "Sidekiq job execution duration", unit: "ms" ) job_error_counter = @meter.create_counter( "sidekiq.job.error.count", description: "Total Sidekiq jobs that raised an error", unit: "{jobs}" ) _job_counter = job_counter _job_duration = job_duration _job_error_counter = job_error_counter middleware_class = Class.new do define_method(:call) do |_worker, msg, queue, &block| start = Process.clock_gettime(Process::CLOCK_MONOTONIC) error_raised = false begin block.call rescue StandardError error_raised = true raise ensure elapsed_ms = (Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000.0 attrs = { "sidekiq.job.class" => msg["class"].to_s, "sidekiq.queue" => queue.to_s, "sidekiq.status" => error_raised ? "error" : "success", } _job_counter.add(1, attributes: attrs) _job_duration.record(elapsed_ms, attributes: attrs) _job_error_counter.add(1, attributes: attrs) if error_raised end end end Sidekiq.configure_server do |config| config.server_middleware do |chain| chain.add middleware_class end end rescue StandardError => e OpenTelemetry.logger.warn("[Tracelit] failed to install Sidekiq middleware: #{e.}") end |
.meter ⇒ Object
56 57 58 |
# File 'lib/tracelit/metrics.rb', line 56 def self.meter @meter end |
.read_cpu_time_s(pid, linux) ⇒ Object
Returns cumulative CPU time (user + system) for this process in seconds. On Linux reads /proc/self/stat; on macOS/BSD falls back to ps %cpu which gives an instantaneous percentage instead (treated as fractional seconds over a 1-second window — good enough for a 30 s gauge).
421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 |
# File 'lib/tracelit/metrics.rb', line 421 def self.read_cpu_time_s(pid, linux) if linux stat = begin File.read("/proc/self/stat") rescue return nil end # Format: pid (comm) state ppid ... utime stime ... # comm can contain spaces — find last ')' and split from there. after_comm = stat[stat.rindex(")").to_i + 1..] return nil unless after_comm fields = after_comm.split # After ')': state(0) ppid(1) ... utime(11) stime(12) utime = fields[11]&.to_i stime = fields[12]&.to_i return nil unless utime && stime # Jiffies at 100 Hz → seconds (utime + stime) / 100.0 else # macOS/BSD: `ps` gives current CPU % directly. # Return it as a fractional "seconds per second" proxy so the # delta calculation above yields the right percentage. out = `ps -o %cpu= -p #{Integer(pid)} 2>/dev/null`.strip return nil if out.empty? out.to_f / 100.0 end end |
.restart_pollers(config) ⇒ Object
Fix 5 (support): Called from the Process._fork hook in Instrumentation to restart background polling threads inside each forked Puma/Unicorn worker. The parent-process threads are dead in the child; this revives them.
63 64 65 66 67 68 69 70 71 72 |
# File 'lib/tracelit/metrics.rb', line 63 def self.restart_pollers(config) @connection_pool_poller_installed = false @memory_poller_installed = false @cpu_poller_installed = false install_connection_pool_poller if defined?(::ActiveRecord) install_memory_poller install_cpu_poller rescue StandardError => e OpenTelemetry.logger.warn("[Tracelit] failed to restart pollers after fork: #{e.}") end |
.setup(config) ⇒ Object
Sets up the OpenTelemetry MeterProvider with OTLP exporter. Called once from Instrumentation.setup after trace setup.
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/tracelit/metrics.rb', line 16 def self.setup(config) exporter = OpenTelemetry::Exporter::OTLP::Metrics::MetricsExporter.new( endpoint: "#{config.endpoint}/v1/metrics", headers: { "Authorization" => "Bearer #{config.api_key}", "X-Service-Name" => config.resolved_service_name, "X-Environment" => config.environment, } ) reader = OpenTelemetry::SDK::Metrics::Export::PeriodicMetricReader.new( exporter: exporter, export_interval_millis: 60_000, export_timeout_millis: 10_000 ) tp = OpenTelemetry.tracer_provider resource = tp.respond_to?(:resource) ? tp.resource : OpenTelemetry::SDK::Resources::Resource.create({}) provider = OpenTelemetry::SDK::Metrics::MeterProvider.new( resource: resource ) provider.add_metric_reader(reader) OpenTelemetry.meter_provider = provider @meter = provider.meter( config.resolved_service_name, version: Tracelit::VERSION ) install_rails_subscriber if defined?(::Rails) install_sidekiq_middleware if defined?(::Sidekiq) install_connection_pool_poller if defined?(::ActiveRecord) install_memory_poller install_cpu_poller rescue StandardError => e OpenTelemetry.logger.warn("[Tracelit] failed to set up metrics: #{e.}") end |