Class: Hyperion::Metrics
- Inherits:
-
Object
- Object
- Hyperion::Metrics
- Defined in:
- lib/hyperion/metrics.rb,
lib/hyperion/metrics/path_templater.rb
Overview
Lock-free per-thread counters. Each worker thread mutates its own Hash on the hot path — no mutex acquire/release on every increment, no contention across the thread pool. ‘snapshot` aggregates lazily across all threads that have ever incremented (one short mutex section, only taken when the operator asks for stats).
Storage: counters live behind ‘Thread#thread_variable_*`, which is the only TRUE thread-local in Ruby 1.9+ — `Thread.current` is in fact FIBER-local, so under an `Async::Scheduler` (TLS path, h2 streams, the 1.3.0+ `–async-io` plain HTTP/1.1 path) every handler fiber would get its own private counters Hash that `snapshot` could never find. Verified with hyperion-async-pg 0.4.0’s bench round; before the fix the dispatch counters dropped requests entirely under ‘–async-io` and an external scrape (Prometheus exporter on a different fiber than the handler) saw the dispatch buckets at zero.
Cross-fiber races on the same OS thread: the ‘+=` is technically read- modify-write, but Ruby’s fiber scheduler only preempts at IO boundaries (Fiber.scheduler-aware system calls), and ‘Hash#[]=` is purely Ruby —no preemption mid-increment, no torn writes. Two fibers cannot interleave a single `+=` on the same OS thread.
Reset semantics: counters monotonically increase. Operators that want rate-of-change should snapshot, sleep, snapshot, diff.
Public API:
Hyperion.stats -> Hash with all current values across all threads.
Defined Under Namespace
Classes: HistogramAccumulator, PathTemplater
Constant Summary collapse
- REQUESTS_DISPATCH_TOTAL =
2.12-E — labeled counter family that observes which worker process a given request landed on. Ticks once per dispatched request from every dispatch shape (Connection#serve, h2 streams, the C accept4 + io_uring loops; see PrometheusExporter for the C-loop fold-in at scrape time).
‘worker_id` is conventionally `Process.pid.to_s` — matches the 2.4-C `hyperion_io_uring_workers_active` and `hyperion_per_conn_rejections_total` labeling convention; lets operators correlate distribution rows with `ps`/`/proc` data without a separate worker_id <-> pid mapping table.
Hot-path cost: one ‘@hg_mutex` acquisition per tick. That’s acceptable for the audit metric: contention shows up only on the ‘tick + render` overlap, never inside the C accept loop (which uses its own atomic counter folded in at scrape time). Worth the simplicity over an extra lock-free per-thread cache.
:hyperion_requests_dispatch_total- WORKER_ID_LABEL_KEYS =
%w[worker_id].freeze
- EMPTY_LABELS =
Frozen empty Array used as the default label tuple. Reused across all label-less observations so we don’t allocate a fresh ‘[]` per scrape — keeps hot-path work allocation-free for the un-labeled gauge/histogram families.
[].freeze
Class Attribute Summary collapse
Class Method Summary collapse
Instance Method Summary collapse
- #decrement(key, by = 1) ⇒ Object
- #decrement_gauge(name, label_values = EMPTY_LABELS, delta = 1) ⇒ Object
-
#ensure_worker_request_family_registered! ⇒ Object
2.12-E — Idempotently register the labeled-counter family.
- #gauge_meta(name) ⇒ Object
- #gauge_snapshot ⇒ Object
-
#histogram_meta(name) ⇒ Object
Register that a histogram/gauge family exists with this label ordering.
-
#histogram_snapshot ⇒ Object
Snapshot helpers — read-only views of the current histogram / gauge state.
-
#increment(key, by = 1) ⇒ Object
Hot path: one thread-variable lookup + one hash op.
-
#increment_gauge(name, label_values = EMPTY_LABELS, delta = 1) ⇒ Object
Increment a gauge by ‘delta` (default 1).
-
#increment_labeled_counter(name, label_values = EMPTY_LABELS, by = 1) ⇒ Object
Labeled counter — separate from the legacy thread-local counter surface (which is unlabeled and per-thread for hot-path contention-free increments).
- #increment_status(code) ⇒ Object
-
#initialize ⇒ Metrics
constructor
A new instance of Metrics.
-
#labeled_counter_snapshot ⇒ Object
2.13-A — Snapshot merges per-thread shards.
-
#observe_histogram(name, value, label_values = EMPTY_LABELS) ⇒ Object
Observe ‘value` on a previously-registered histogram.
-
#register_histogram(name, buckets:, label_keys: []) ⇒ Object
Register a histogram family.
- #register_labeled_counter(name, label_keys: []) ⇒ Object
-
#reset! ⇒ Object
Tests can call .reset! between examples to avoid cross-spec leakage.
-
#set_gauge(name, value = nil, label_values = EMPTY_LABELS, &block) ⇒ Object
Set a gauge value.
- #snapshot ⇒ Object
- #tick_worker_request(worker_id) ⇒ Object
Constructor Details
#initialize ⇒ Metrics
Returns a new instance of Metrics.
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# File 'lib/hyperion/metrics.rb', line 49 def initialize # Direct list of every per-thread counters Hash ever allocated through # this Metrics instance. We hold the Hash refs ourselves (instead of # holding Thread refs and looking the Hash up via thread-local # storage) so snapshot survives thread death — counters from a # short-lived worker that already exited still aggregate. Tiny per- # thread footprint (one Hash + one slot in this Array). @thread_counters = [] @counters_mutex = Mutex.new # Per-instance thread-local key so spec runs that build fresh Metrics # objects don't share state across examples. @thread_key = :"__hyperion_metrics_#{object_id}__" # 2.4-C — observability enrichment. Histograms and gauges live as # separate keyed structures (vs counters) because the wire format # is different (per-bucket cumulative counts + sum/count for # histograms; a single instantaneous reading for gauges). Both are # mutex-guarded — these are scrape-rate operations (one observe per # request, one set per worker boot/shutdown), not per-syscall. # # Histograms: `{ name => { labels_tuple_array => HistogramAccumulator } }`. # Gauges: `{ name => { labels_tuple_array => Float } }`. # `labels_tuple_array` is a frozen Array<String> of label values # (stable order, supplied by the observer); it doubles as the Hash # key for cheap O(1) lookup. @histograms = {} @histograms_meta = {} # name => { buckets:, label_keys: } @gauges = {} @gauges_meta = {} # name => { label_keys: } @hg_mutex = Mutex.new # Snapshot block hooks for gauges whose value is read on demand # (ThreadPool queue depth, etc.). `{ name => { labels_tuple => Proc } }`. @gauge_blocks = {} # 2.13-A — per-thread shards for the hot-path metrics that USED to # take @hg_mutex on every observe / increment. The pre-2.13-A # comment in `increment_labeled_counter` claimed those paths were # "low-rate" — that turned out to be wrong: `tick_worker_request` # fires once per Rack request, and `observe_histogram` fires once # per request via the per-route duration histogram. Under -t 32 # the single mutex serialised every worker thread on the # request-completion tail. Per-thread shards remove the # contention; snapshots merge across threads under the mutex # (snapshot is a low-rate operation — once per /-/metrics scrape). # # Thread-variable storage (NOT Thread.current[]) for the same # reason as the unlabeled counter path: under an Async scheduler # `Thread.current[:k]` is fiber-local, which would let snapshots # miss observations made on a fiber that already exited. @hg_thread_key = :"__hyperion_metrics_hg_#{object_id}__" @lc_thread_key = :"__hyperion_metrics_lc_#{object_id}__" # Holds direct references to every per-thread shard ever # allocated through this Metrics instance (mirrors @thread_counters) # so snapshots survive thread death. @thread_histograms = [] @thread_labeled_counters = [] @hg_thread_mutex = Mutex.new end |
Class Attribute Details
.default_path_templater ⇒ Object
40 41 42 |
# File 'lib/hyperion/metrics.rb', line 40 def default_path_templater @default_path_templater ||= PathTemplater.new end |
Class Method Details
.reset_default_path_templater! ⇒ Object
44 45 46 |
# File 'lib/hyperion/metrics.rb', line 44 def reset_default_path_templater! @default_path_templater = nil end |
Instance Method Details
#decrement(key, by = 1) ⇒ Object
132 133 134 |
# File 'lib/hyperion/metrics.rb', line 132 def decrement(key, by = 1) increment(key, -by) end |
#decrement_gauge(name, label_values = EMPTY_LABELS, delta = 1) ⇒ Object
293 294 295 |
# File 'lib/hyperion/metrics.rb', line 293 def decrement_gauge(name, label_values = EMPTY_LABELS, delta = 1) increment_gauge(name, label_values, -delta) end |
#ensure_worker_request_family_registered! ⇒ Object
2.12-E — Idempotently register the labeled-counter family. Public so ‘Server#run_c_accept_loop` can register at boot — the PrometheusExporter’s C-loop fold-in is gated on the family being in the snapshot, and a 100% C-loop worker never goes through ‘tick_worker_request` to register lazily.
171 172 173 174 175 176 |
# File 'lib/hyperion/metrics.rb', line 171 def ensure_worker_request_family_registered! return if @worker_request_family_registered register_labeled_counter(REQUESTS_DISPATCH_TOTAL, label_keys: WORKER_ID_LABEL_KEYS) @worker_request_family_registered = true end |
#gauge_meta(name) ⇒ Object
304 305 306 |
# File 'lib/hyperion/metrics.rb', line 304 def (name) @hg_mutex.synchronize { @gauges_meta[name]&.dup } end |
#gauge_snapshot ⇒ Object
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 |
# File 'lib/hyperion/metrics.rb', line 347 def gauge_snapshot out = {} @hg_mutex.synchronize do names = (@gauges.keys + @gauge_blocks.keys).uniq names.each do |name| per_labels = {} @gauges[name]&.each { |labels, value| per_labels[labels] = value.to_f } @gauge_blocks[name]&.each do |labels, block| # Block-evaluated gauges read live state at scrape time. We # release the mutex around the block call to avoid holding # while user code runs, BUT we currently hold @hg_mutex — # the contract is that the block is short and side-effect- # free (e.g., reads ThreadPool#queue_size). That's the only # use case we wire today; document if extended. per_labels[labels] = block.call.to_f rescue StandardError # Snapshot must never raise — a misbehaving block degrades # to "no reading" rather than a 500 on /-/metrics. next end out[name] = { meta: @gauges_meta[name] || { label_keys: [].freeze }, series: per_labels } end end out end |
#histogram_meta(name) ⇒ Object
Register that a histogram/gauge family exists with this label ordering. The PrometheusExporter calls ‘histogram_meta` / `gauge_meta` at scrape time to build the HELP/TYPE preamble.
300 301 302 |
# File 'lib/hyperion/metrics.rb', line 300 def (name) @hg_mutex.synchronize { @histograms_meta[name]&.dup } end |
#histogram_snapshot ⇒ Object
Snapshot helpers — read-only views of the current histogram / gauge state. The exporter uses these to render the scrape body.
2.13-A — Histograms merge across the per-thread shards on the snapshot path. The mutex is held only long enough to copy the shard list (every shard Hash is owned by one thread, so we can iterate its current contents safely while merging — torn reads of in-progress observations show as a slightly stale snapshot, never as a corrupted Accumulator).
317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 |
# File 'lib/hyperion/metrics.rb', line 317 def histogram_snapshot out = {} # Pre-seed names from registered families so a histogram with # zero observations still appears in the scrape (matches the # pre-2.13-A behaviour where `register_histogram` populated the # `@histograms[name] = {}` slot eagerly). @hg_mutex.synchronize do @histograms_meta.each_key { |name| out[name] = { meta: @histograms_meta[name], series: {} } } end shards = @hg_thread_mutex.synchronize { @thread_histograms.dup } shards.each do |shard| shard.each do |name, family| slot = (out[name] ||= { meta: @histograms_meta[name], series: {} }) series = slot[:series] family.each do |labels, accum| existing = series[labels] if existing.nil? series[labels] = accum.snapshot else merge_histogram_snapshot!(existing, accum) end end end end out end |
#increment(key, by = 1) ⇒ Object
Hot path: one thread-variable lookup + one hash op. No mutex on the increment fast path; the mutex is taken only on first allocation per OS thread (very rare) and on snapshot.
Storage uses Thread#thread_variable_*, which is the only TRUE thread- local in Ruby 1.9+ — Thread.current is in fact FIBER-local, so under an Async::Scheduler (TLS path, h2 streams, the 1.3.0+ –async-io plain HTTP/1.1 path) every handler fiber would get its own private counters Hash that snapshot could never aggregate. Verified with hyperion-async-pg 0.4.0’s bench round; before the fix the dispatch counters dropped requests under –async-io.
Cross-fiber races on the same OS thread: the ‘+=` is read-modify-write, but Ruby’s fiber scheduler only preempts at IO boundaries (Fiber- scheduler-aware system calls). Hash#[]= is purely Ruby — no preemption mid-increment, no torn writes. Two fibers cannot interleave a single ‘+=` on the same OS thread.
125 126 127 128 129 130 |
# File 'lib/hyperion/metrics.rb', line 125 def increment(key, by = 1) thread = Thread.current counters = thread.thread_variable_get(@thread_key) counters = register_thread_counters(thread) if counters.nil? counters[key] += by end |
#increment_gauge(name, label_values = EMPTY_LABELS, delta = 1) ⇒ Object
Increment a gauge by ‘delta` (default 1). Used for kTLS active connections, etc. — paired with `decrement_gauge` on close.
284 285 286 287 288 289 290 291 |
# File 'lib/hyperion/metrics.rb', line 284 def increment_gauge(name, label_values = EMPTY_LABELS, delta = 1) @hg_mutex.synchronize do @gauges_meta[name] ||= { label_keys: [].freeze } family = (@gauges[name] ||= {}) key = label_values.frozen? ? label_values : label_values.dup.freeze family[key] = (family[key] || 0.0) + delta.to_f end end |
#increment_labeled_counter(name, label_values = EMPTY_LABELS, by = 1) ⇒ Object
Labeled counter — separate from the legacy thread-local counter surface (which is unlabeled and per-thread for hot-path contention-free increments).
2.13-A — moved to a per-thread shard for the same reason as ‘observe_histogram`: the previous “low-rate paths” claim was wrong (`tick_worker_request` is per-Rack-request), and at -t 32 the single mutex serialised every worker thread on the request- completion tail. Per-thread shards remove the contention; `labeled_counter_snapshot` merges shards under the mutex.
389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 |
# File 'lib/hyperion/metrics.rb', line 389 def increment_labeled_counter(name, label_values = EMPTY_LABELS, by = 1) thread = Thread.current shard = thread.thread_variable_get(@lc_thread_key) shard = register_thread_labeled_counters(thread) if shard.nil? # Defensive: ensure the family meta exists so `register_labeled_counter` # is not strictly required for hot-path increments. Pre-2.13-A the # mutex'd path lazily registered an unlabeled meta; we mirror that # under @hg_mutex so the shape stays consistent across threads. unless @labeled_counters_meta && @labeled_counters_meta[name] @hg_mutex.synchronize do @labeled_counters_meta ||= {} @labeled_counters_meta[name] ||= { label_keys: [].freeze } end end family = (shard[name] ||= {}) key = label_values.frozen? ? label_values : label_values.dup.freeze family[key] = (family[key] || 0) + by end |
#increment_status(code) ⇒ Object
136 137 138 |
# File 'lib/hyperion/metrics.rb', line 136 def increment_status(code) increment(:"responses_#{code}") end |
#labeled_counter_snapshot ⇒ Object
2.13-A — Snapshot merges per-thread shards. Pre-seeded with ‘@labeled_counters_meta` so registered-but-unticked families still show up in the scrape (matches pre-2.13-A behaviour where `register_labeled_counter` eagerly created the `[name] = {}` slot).
423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 |
# File 'lib/hyperion/metrics.rb', line 423 def labeled_counter_snapshot out = {} @hg_mutex.synchronize do (@labeled_counters_meta || {}).each do |name, | out[name] = { meta: , series: {} } end end shards = @hg_thread_mutex.synchronize { @thread_labeled_counters.dup } shards.each do |shard| shard.each do |name, family| = (@labeled_counters_meta || {})[name] || { label_keys: [].freeze } slot = (out[name] ||= { meta: , series: {} }) series = slot[:series] family.each do |labels, count| series[labels] = (series[labels] || 0) + count end end end out end |
#observe_histogram(name, value, label_values = EMPTY_LABELS) ⇒ Object
Observe ‘value` on a previously-registered histogram. `label_values` MUST be supplied in the same order as `label_keys` at registration.
2.13-A — Hot-path lock-free shard. Each thread keeps its own ‘{ name => { labels => HistogramAccumulator } }` map; observations never block on `@hg_mutex`. Snapshots merge across threads under the mutex (low-rate). Allocation footprint per observe: zero on the cached-key path; one frozen Array + one HistogramAccumulator on first observation for a given (name, label-set, thread).
Bench impact (generic Rack hello, -t 32 -c 100): contention on ‘@hg_mutex` was the dominant tail latency contributor — this fires once per request via the per-route request-duration histogram, multiplied by N worker threads.
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
# File 'lib/hyperion/metrics.rb', line 246 def observe_histogram(name, value, label_values = EMPTY_LABELS) = @histograms_meta[name] return unless # silently skip unregistered observations thread = Thread.current shard = thread.thread_variable_get(@hg_thread_key) shard = register_thread_histograms(thread) if shard.nil? family = (shard[name] ||= {}) accum = family[label_values] unless accum accum = HistogramAccumulator.new([:buckets]) # Freeze the label tuple so future identical-content tuples # hash to the same bucket — but we keep the original ref # provided by the caller as the canonical key so subsequent # observes with the same Array bypass the freeze step. family[label_values.frozen? ? label_values : label_values.dup.freeze] = accum end accum.observe(value) end |
#register_histogram(name, buckets:, label_keys: []) ⇒ Object
Register a histogram family. Idempotent — re-registering with the same buckets/label_keys is a no-op; mismatched re-register raises so a typo surfaces at boot rather than corrupting the scrape output.
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
# File 'lib/hyperion/metrics.rb', line 215 def register_histogram(name, buckets:, label_keys: []) @hg_mutex.synchronize do if (existing = @histograms_meta[name]) unless existing[:buckets] == buckets && existing[:label_keys] == label_keys raise ArgumentError, "histogram #{name.inspect} re-registered with different shape " \ "(was buckets=#{existing[:buckets]} labels=#{existing[:label_keys]}; " \ "now buckets=#{buckets} labels=#{label_keys})" end return end @histograms_meta[name] = { buckets: buckets.dup.freeze, label_keys: label_keys.dup.freeze } @histograms[name] = {} end end |
#register_labeled_counter(name, label_keys: []) ⇒ Object
410 411 412 413 414 415 416 417 |
# File 'lib/hyperion/metrics.rb', line 410 def register_labeled_counter(name, label_keys: []) @hg_mutex.synchronize do @labeled_counters_meta ||= {} @labeled_counters_meta[name] = { label_keys: label_keys.dup.freeze } @labeled_counters ||= {} @labeled_counters[name] ||= {} end end |
#reset! ⇒ Object
Tests can call .reset! between examples to avoid cross-spec leakage.
2.13-A — also clear per-thread histogram and labeled-counter shards. Without this, an observation made on thread A in spec X would leak into spec Y’s snapshot because the shard hashes are held alive by ‘@thread_histograms` / `@thread_labeled_counters` for the lifetime of the Metrics instance.
195 196 197 198 199 200 201 202 203 204 205 206 207 208 |
# File 'lib/hyperion/metrics.rb', line 195 def reset! @counters_mutex.synchronize do @thread_counters.each(&:clear) end @hg_mutex.synchronize do @histograms.each_value(&:clear) @gauges.each_value(&:clear) @gauge_blocks.each_value(&:clear) end @hg_thread_mutex.synchronize do @thread_histograms.each(&:clear) @thread_labeled_counters.each(&:clear) end end |
#set_gauge(name, value = nil, label_values = EMPTY_LABELS, &block) ⇒ Object
Set a gauge value. ‘label_values` follows the same convention as `observe_histogram`. Pass a block to register a callback that’s evaluated lazily at snapshot time (ThreadPool queue depth, etc.) —the callback’s return value is the gauge’s current reading.
271 272 273 274 275 276 277 278 279 280 |
# File 'lib/hyperion/metrics.rb', line 271 def set_gauge(name, value = nil, label_values = EMPTY_LABELS, &block) @hg_mutex.synchronize do @gauges_meta[name] ||= { label_keys: [].freeze } if block (@gauge_blocks[name] ||= {})[label_values.frozen? ? label_values : label_values.dup.freeze] = block else (@gauges[name] ||= {})[label_values.frozen? ? label_values : label_values.dup.freeze] = value.to_f end end end |
#snapshot ⇒ Object
178 179 180 181 182 183 184 185 186 |
# File 'lib/hyperion/metrics.rb', line 178 def snapshot result = Hash.new(0) counters_snapshot = @counters_mutex.synchronize { @thread_counters.dup } counters_snapshot.each do |counters| counters.each { |k, v| result[k] += v } end result.default = nil result end |
#tick_worker_request(worker_id) ⇒ Object
160 161 162 163 164 |
# File 'lib/hyperion/metrics.rb', line 160 def tick_worker_request(worker_id) label = worker_id.nil? || worker_id.to_s.empty? ? '0' : worker_id.to_s ensure_worker_request_family_registered! increment_labeled_counter(REQUESTS_DISPATCH_TOTAL, [label]) end |