Class: Hyperion::Metrics

Inherits:
Object
  • Object
show all
Defined in:
lib/hyperion/metrics.rb,
lib/hyperion/metrics/path_templater.rb

Overview

Lock-free per-thread counters. Each worker thread mutates its own Hash on the hot path — no mutex acquire/release on every increment, no contention across the thread pool. ‘snapshot` aggregates lazily across all threads that have ever incremented (one short mutex section, only taken when the operator asks for stats).

Storage: counters live behind ‘Thread#thread_variable_*`, which is the only TRUE thread-local in Ruby 1.9+ — `Thread.current` is in fact FIBER-local, so under an `Async::Scheduler` (TLS path, h2 streams, the 1.3.0+ `–async-io` plain HTTP/1.1 path) every handler fiber would get its own private counters Hash that `snapshot` could never find. Verified with hyperion-async-pg 0.4.0’s bench round; before the fix the dispatch counters dropped requests entirely under ‘–async-io` and an external scrape (Prometheus exporter on a different fiber than the handler) saw the dispatch buckets at zero.

Cross-fiber races on the same OS thread: the ‘+=` is technically read- modify-write, but Ruby’s fiber scheduler only preempts at IO boundaries (Fiber.scheduler-aware system calls), and ‘Hash#[]=` is purely Ruby —no preemption mid-increment, no torn writes. Two fibers cannot interleave a single `+=` on the same OS thread.

Reset semantics: counters monotonically increase. Operators that want rate-of-change should snapshot, sleep, snapshot, diff.

Public API:

Hyperion.stats -> Hash with all current values across all threads.

Defined Under Namespace

Classes: HistogramAccumulator, PathTemplater

Constant Summary collapse

REQUESTS_DISPATCH_TOTAL =

2.12-E — labeled counter family that observes which worker process a given request landed on. Ticks once per dispatched request from every dispatch shape (Connection#serve, h2 streams, the C accept4 + io_uring loops; see PrometheusExporter for the C-loop fold-in at scrape time).

‘worker_id` is conventionally `Process.pid.to_s` — matches the 2.4-C `hyperion_io_uring_workers_active` and `hyperion_per_conn_rejections_total` labeling convention; lets operators correlate distribution rows with `ps`/`/proc` data without a separate worker_id <-> pid mapping table.

Hot-path cost: one ‘@hg_mutex` acquisition per tick. That’s acceptable for the audit metric: contention shows up only on the ‘tick + render` overlap, never inside the C accept loop (which uses its own atomic counter folded in at scrape time). Worth the simplicity over an extra lock-free per-thread cache.

:hyperion_requests_dispatch_total
WORKER_ID_LABEL_KEYS =
%w[worker_id].freeze
EMPTY_LABELS =

Frozen empty Array used as the default label tuple. Reused across all label-less observations so we don’t allocate a fresh ‘[]` per scrape — keeps hot-path work allocation-free for the un-labeled gauge/histogram families.

[].freeze

Class Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeMetrics

Returns a new instance of Metrics.



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/hyperion/metrics.rb', line 49

def initialize
  # Direct list of every per-thread counters Hash ever allocated through
  # this Metrics instance. We hold the Hash refs ourselves (instead of
  # holding Thread refs and looking the Hash up via thread-local
  # storage) so snapshot survives thread death — counters from a
  # short-lived worker that already exited still aggregate. Tiny per-
  # thread footprint (one Hash + one slot in this Array).
  @thread_counters = []
  @counters_mutex = Mutex.new
  # Per-instance thread-local key so spec runs that build fresh Metrics
  # objects don't share state across examples.
  @thread_key = :"__hyperion_metrics_#{object_id}__"

  # 2.4-C — observability enrichment. Histograms and gauges live as
  # separate keyed structures (vs counters) because the wire format
  # is different (per-bucket cumulative counts + sum/count for
  # histograms; a single instantaneous reading for gauges). Both are
  # mutex-guarded — these are scrape-rate operations (one observe per
  # request, one set per worker boot/shutdown), not per-syscall.
  #
  # Histograms: `{ name => { labels_tuple_array => HistogramAccumulator } }`.
  # Gauges:     `{ name => { labels_tuple_array => Float } }`.
  # `labels_tuple_array` is a frozen Array<String> of label values
  # (stable order, supplied by the observer); it doubles as the Hash
  # key for cheap O(1) lookup.
  @histograms      = {}
  @histograms_meta = {} # name => { buckets:, label_keys: }
  @gauges          = {}
  @gauges_meta     = {} # name => { label_keys: }
  @hg_mutex        = Mutex.new
  # Snapshot block hooks for gauges whose value is read on demand
  # (ThreadPool queue depth, etc.). `{ name => { labels_tuple => Proc } }`.
  @gauge_blocks    = {}

  # 2.13-A — per-thread shards for the hot-path metrics that USED to
  # take @hg_mutex on every observe / increment. The pre-2.13-A
  # comment in `increment_labeled_counter` claimed those paths were
  # "low-rate" — that turned out to be wrong: `tick_worker_request`
  # fires once per Rack request, and `observe_histogram` fires once
  # per request via the per-route duration histogram. Under -t 32
  # the single mutex serialised every worker thread on the
  # request-completion tail. Per-thread shards remove the
  # contention; snapshots merge across threads under the mutex
  # (snapshot is a low-rate operation — once per /-/metrics scrape).
  #
  # Thread-variable storage (NOT Thread.current[]) for the same
  # reason as the unlabeled counter path: under an Async scheduler
  # `Thread.current[:k]` is fiber-local, which would let snapshots
  # miss observations made on a fiber that already exited.
  @hg_thread_key   = :"__hyperion_metrics_hg_#{object_id}__"
  @lc_thread_key   = :"__hyperion_metrics_lc_#{object_id}__"
  # Holds direct references to every per-thread shard ever
  # allocated through this Metrics instance (mirrors @thread_counters)
  # so snapshots survive thread death.
  @thread_histograms = []
  @thread_labeled_counters = []
  @hg_thread_mutex         = Mutex.new
end

Class Attribute Details

.default_path_templaterObject



40
41
42
# File 'lib/hyperion/metrics.rb', line 40

def default_path_templater
  @default_path_templater ||= PathTemplater.new
end

Class Method Details

.reset_default_path_templater!Object



44
45
46
# File 'lib/hyperion/metrics.rb', line 44

def reset_default_path_templater!
  @default_path_templater = nil
end

Instance Method Details

#decrement(key, by = 1) ⇒ Object



132
133
134
# File 'lib/hyperion/metrics.rb', line 132

def decrement(key, by = 1)
  increment(key, -by)
end

#decrement_gauge(name, label_values = EMPTY_LABELS, delta = 1) ⇒ Object



293
294
295
# File 'lib/hyperion/metrics.rb', line 293

def decrement_gauge(name, label_values = EMPTY_LABELS, delta = 1)
  increment_gauge(name, label_values, -delta)
end

#ensure_worker_request_family_registered!Object

2.12-E — Idempotently register the labeled-counter family. Public so ‘Server#run_c_accept_loop` can register at boot — the PrometheusExporter’s C-loop fold-in is gated on the family being in the snapshot, and a 100% C-loop worker never goes through ‘tick_worker_request` to register lazily.



171
172
173
174
175
176
# File 'lib/hyperion/metrics.rb', line 171

def ensure_worker_request_family_registered!
  return if @worker_request_family_registered

  register_labeled_counter(REQUESTS_DISPATCH_TOTAL, label_keys: WORKER_ID_LABEL_KEYS)
  @worker_request_family_registered = true
end

#gauge_meta(name) ⇒ Object



304
305
306
# File 'lib/hyperion/metrics.rb', line 304

def gauge_meta(name)
  @hg_mutex.synchronize { @gauges_meta[name]&.dup }
end

#gauge_snapshotObject



347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
# File 'lib/hyperion/metrics.rb', line 347

def gauge_snapshot
  out = {}
  @hg_mutex.synchronize do
    names = (@gauges.keys + @gauge_blocks.keys).uniq
    names.each do |name|
      per_labels = {}
      @gauges[name]&.each { |labels, value| per_labels[labels] = value.to_f }
      @gauge_blocks[name]&.each do |labels, block|
        # Block-evaluated gauges read live state at scrape time. We
        # release the mutex around the block call to avoid holding
        # while user code runs, BUT we currently hold @hg_mutex —
        # the contract is that the block is short and side-effect-
        # free (e.g., reads ThreadPool#queue_size). That's the only
        # use case we wire today; document if extended.
        per_labels[labels] = block.call.to_f
      rescue StandardError
        # Snapshot must never raise — a misbehaving block degrades
        # to "no reading" rather than a 500 on /-/metrics.
        next
      end
      out[name] = { meta: @gauges_meta[name] || { label_keys: [].freeze }, series: per_labels }
    end
  end
  out
end

#histogram_meta(name) ⇒ Object

Register that a histogram/gauge family exists with this label ordering. The PrometheusExporter calls ‘histogram_meta` / `gauge_meta` at scrape time to build the HELP/TYPE preamble.



300
301
302
# File 'lib/hyperion/metrics.rb', line 300

def histogram_meta(name)
  @hg_mutex.synchronize { @histograms_meta[name]&.dup }
end

#histogram_snapshotObject

Snapshot helpers — read-only views of the current histogram / gauge state. The exporter uses these to render the scrape body.

2.13-A — Histograms merge across the per-thread shards on the snapshot path. The mutex is held only long enough to copy the shard list (every shard Hash is owned by one thread, so we can iterate its current contents safely while merging — torn reads of in-progress observations show as a slightly stale snapshot, never as a corrupted Accumulator).



317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
# File 'lib/hyperion/metrics.rb', line 317

def histogram_snapshot
  out = {}

  # Pre-seed names from registered families so a histogram with
  # zero observations still appears in the scrape (matches the
  # pre-2.13-A behaviour where `register_histogram` populated the
  # `@histograms[name] = {}` slot eagerly).
  @hg_mutex.synchronize do
    @histograms_meta.each_key { |name| out[name] = { meta: @histograms_meta[name], series: {} } }
  end

  shards = @hg_thread_mutex.synchronize { @thread_histograms.dup }
  shards.each do |shard|
    shard.each do |name, family|
      slot = (out[name] ||= { meta: @histograms_meta[name], series: {} })
      series = slot[:series]
      family.each do |labels, accum|
        existing = series[labels]
        if existing.nil?
          series[labels] = accum.snapshot
        else
          merge_histogram_snapshot!(existing, accum)
        end
      end
    end
  end

  out
end

#increment(key, by = 1) ⇒ Object

Hot path: one thread-variable lookup + one hash op. No mutex on the increment fast path; the mutex is taken only on first allocation per OS thread (very rare) and on snapshot.

Storage uses Thread#thread_variable_*, which is the only TRUE thread- local in Ruby 1.9+ — Thread.current is in fact FIBER-local, so under an Async::Scheduler (TLS path, h2 streams, the 1.3.0+ –async-io plain HTTP/1.1 path) every handler fiber would get its own private counters Hash that snapshot could never aggregate. Verified with hyperion-async-pg 0.4.0’s bench round; before the fix the dispatch counters dropped requests under –async-io.

Cross-fiber races on the same OS thread: the ‘+=` is read-modify-write, but Ruby’s fiber scheduler only preempts at IO boundaries (Fiber- scheduler-aware system calls). Hash#[]= is purely Ruby — no preemption mid-increment, no torn writes. Two fibers cannot interleave a single ‘+=` on the same OS thread.



125
126
127
128
129
130
# File 'lib/hyperion/metrics.rb', line 125

def increment(key, by = 1)
  thread = Thread.current
  counters = thread.thread_variable_get(@thread_key)
  counters = register_thread_counters(thread) if counters.nil?
  counters[key] += by
end

#increment_gauge(name, label_values = EMPTY_LABELS, delta = 1) ⇒ Object

Increment a gauge by ‘delta` (default 1). Used for kTLS active connections, etc. — paired with `decrement_gauge` on close.



284
285
286
287
288
289
290
291
# File 'lib/hyperion/metrics.rb', line 284

def increment_gauge(name, label_values = EMPTY_LABELS, delta = 1)
  @hg_mutex.synchronize do
    @gauges_meta[name] ||= { label_keys: [].freeze }
    family = (@gauges[name] ||= {})
    key    = label_values.frozen? ? label_values : label_values.dup.freeze
    family[key] = (family[key] || 0.0) + delta.to_f
  end
end

#increment_labeled_counter(name, label_values = EMPTY_LABELS, by = 1) ⇒ Object

Labeled counter — separate from the legacy thread-local counter surface (which is unlabeled and per-thread for hot-path contention-free increments).

2.13-A — moved to a per-thread shard for the same reason as ‘observe_histogram`: the previous “low-rate paths” claim was wrong (`tick_worker_request` is per-Rack-request), and at -t 32 the single mutex serialised every worker thread on the request- completion tail. Per-thread shards remove the contention; `labeled_counter_snapshot` merges shards under the mutex.



389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
# File 'lib/hyperion/metrics.rb', line 389

def increment_labeled_counter(name, label_values = EMPTY_LABELS, by = 1)
  thread = Thread.current
  shard = thread.thread_variable_get(@lc_thread_key)
  shard = register_thread_labeled_counters(thread) if shard.nil?

  # Defensive: ensure the family meta exists so `register_labeled_counter`
  # is not strictly required for hot-path increments. Pre-2.13-A the
  # mutex'd path lazily registered an unlabeled meta; we mirror that
  # under @hg_mutex so the shape stays consistent across threads.
  unless @labeled_counters_meta && @labeled_counters_meta[name]
    @hg_mutex.synchronize do
      @labeled_counters_meta ||= {}
      @labeled_counters_meta[name] ||= { label_keys: [].freeze }
    end
  end

  family = (shard[name] ||= {})
  key    = label_values.frozen? ? label_values : label_values.dup.freeze
  family[key] = (family[key] || 0) + by
end

#increment_status(code) ⇒ Object



136
137
138
# File 'lib/hyperion/metrics.rb', line 136

def increment_status(code)
  increment(:"responses_#{code}")
end

#labeled_counter_snapshotObject

2.13-A — Snapshot merges per-thread shards. Pre-seeded with ‘@labeled_counters_meta` so registered-but-unticked families still show up in the scrape (matches pre-2.13-A behaviour where `register_labeled_counter` eagerly created the `[name] = {}` slot).



423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
# File 'lib/hyperion/metrics.rb', line 423

def labeled_counter_snapshot
  out = {}
  @hg_mutex.synchronize do
    (@labeled_counters_meta || {}).each do |name, meta|
      out[name] = { meta: meta, series: {} }
    end
  end

  shards = @hg_thread_mutex.synchronize { @thread_labeled_counters.dup }
  shards.each do |shard|
    shard.each do |name, family|
      meta = (@labeled_counters_meta || {})[name] || { label_keys: [].freeze }
      slot = (out[name] ||= { meta: meta, series: {} })
      series = slot[:series]
      family.each do |labels, count|
        series[labels] = (series[labels] || 0) + count
      end
    end
  end
  out
end

#observe_histogram(name, value, label_values = EMPTY_LABELS) ⇒ Object

Observe ‘value` on a previously-registered histogram. `label_values` MUST be supplied in the same order as `label_keys` at registration.

2.13-A — Hot-path lock-free shard. Each thread keeps its own ‘{ name => { labels => HistogramAccumulator } }` map; observations never block on `@hg_mutex`. Snapshots merge across threads under the mutex (low-rate). Allocation footprint per observe: zero on the cached-key path; one frozen Array + one HistogramAccumulator on first observation for a given (name, label-set, thread).

Bench impact (generic Rack hello, -t 32 -c 100): contention on ‘@hg_mutex` was the dominant tail latency contributor — this fires once per request via the per-route request-duration histogram, multiplied by N worker threads.



246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
# File 'lib/hyperion/metrics.rb', line 246

def observe_histogram(name, value, label_values = EMPTY_LABELS)
  meta = @histograms_meta[name]
  return unless meta # silently skip unregistered observations

  thread = Thread.current
  shard = thread.thread_variable_get(@hg_thread_key)
  shard = register_thread_histograms(thread) if shard.nil?

  family = (shard[name] ||= {})
  accum  = family[label_values]
  unless accum
    accum = HistogramAccumulator.new(meta[:buckets])
    # Freeze the label tuple so future identical-content tuples
    # hash to the same bucket — but we keep the original ref
    # provided by the caller as the canonical key so subsequent
    # observes with the same Array bypass the freeze step.
    family[label_values.frozen? ? label_values : label_values.dup.freeze] = accum
  end
  accum.observe(value)
end

#register_histogram(name, buckets:, label_keys: []) ⇒ Object

Register a histogram family. Idempotent — re-registering with the same buckets/label_keys is a no-op; mismatched re-register raises so a typo surfaces at boot rather than corrupting the scrape output.



215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
# File 'lib/hyperion/metrics.rb', line 215

def register_histogram(name, buckets:, label_keys: [])
  @hg_mutex.synchronize do
    if (existing = @histograms_meta[name])
      unless existing[:buckets] == buckets && existing[:label_keys] == label_keys
        raise ArgumentError,
              "histogram #{name.inspect} re-registered with different shape " \
              "(was buckets=#{existing[:buckets]} labels=#{existing[:label_keys]}; " \
              "now buckets=#{buckets} labels=#{label_keys})"
      end

      return
    end
    @histograms_meta[name] = { buckets: buckets.dup.freeze, label_keys: label_keys.dup.freeze }
    @histograms[name]      = {}
  end
end

#register_labeled_counter(name, label_keys: []) ⇒ Object



410
411
412
413
414
415
416
417
# File 'lib/hyperion/metrics.rb', line 410

def register_labeled_counter(name, label_keys: [])
  @hg_mutex.synchronize do
    @labeled_counters_meta ||= {}
    @labeled_counters_meta[name] = { label_keys: label_keys.dup.freeze }
    @labeled_counters ||= {}
    @labeled_counters[name] ||= {}
  end
end

#reset!Object

Tests can call .reset! between examples to avoid cross-spec leakage.

2.13-A — also clear per-thread histogram and labeled-counter shards. Without this, an observation made on thread A in spec X would leak into spec Y’s snapshot because the shard hashes are held alive by ‘@thread_histograms` / `@thread_labeled_counters` for the lifetime of the Metrics instance.



195
196
197
198
199
200
201
202
203
204
205
206
207
208
# File 'lib/hyperion/metrics.rb', line 195

def reset!
  @counters_mutex.synchronize do
    @thread_counters.each(&:clear)
  end
  @hg_mutex.synchronize do
    @histograms.each_value(&:clear)
    @gauges.each_value(&:clear)
    @gauge_blocks.each_value(&:clear)
  end
  @hg_thread_mutex.synchronize do
    @thread_histograms.each(&:clear)
    @thread_labeled_counters.each(&:clear)
  end
end

#set_gauge(name, value = nil, label_values = EMPTY_LABELS, &block) ⇒ Object

Set a gauge value. ‘label_values` follows the same convention as `observe_histogram`. Pass a block to register a callback that’s evaluated lazily at snapshot time (ThreadPool queue depth, etc.) —the callback’s return value is the gauge’s current reading.



271
272
273
274
275
276
277
278
279
280
# File 'lib/hyperion/metrics.rb', line 271

def set_gauge(name, value = nil, label_values = EMPTY_LABELS, &block)
  @hg_mutex.synchronize do
    @gauges_meta[name] ||= { label_keys: [].freeze }
    if block
      (@gauge_blocks[name] ||= {})[label_values.frozen? ? label_values : label_values.dup.freeze] = block
    else
      (@gauges[name] ||= {})[label_values.frozen? ? label_values : label_values.dup.freeze] = value.to_f
    end
  end
end

#snapshotObject



178
179
180
181
182
183
184
185
186
# File 'lib/hyperion/metrics.rb', line 178

def snapshot
  result = Hash.new(0)
  counters_snapshot = @counters_mutex.synchronize { @thread_counters.dup }
  counters_snapshot.each do |counters|
    counters.each { |k, v| result[k] += v }
  end
  result.default = nil
  result
end

#tick_worker_request(worker_id) ⇒ Object



160
161
162
163
164
# File 'lib/hyperion/metrics.rb', line 160

def tick_worker_request(worker_id)
  label = worker_id.nil? || worker_id.to_s.empty? ? '0' : worker_id.to_s
  ensure_worker_request_family_registered!
  increment_labeled_counter(REQUESTS_DISPATCH_TOTAL, [label])
end