Module: DispatchPolicy::InflightTracker
- Extended by:
- ActiveSupport::Concern
- Defined in:
- lib/dispatch_policy/inflight_tracker.rb
Overview
Around-perform that records each job execution in dispatch_policy_inflight_jobs while it runs, so the concurrency gate can count active jobs per partition.
While the job runs we spawn a heartbeat thread that bumps ‘heartbeat_at` every `config.inflight_heartbeat_interval` seconds. Without this, jobs longer than `inflight_stale_after` (default 5 min) get their inflight row prematurely swept and the concurrency gate over-admits.
Defined Under Namespace
Classes: Heartbeat, ThreadSafeFlag
Constant Summary collapse
- HEARTBEAT_KEY =
—– heartbeat thread —–
:__dispatch_policy_heartbeat_token__
Class Method Summary collapse
-
.lookup_admitted_at(active_job_id) ⇒ Object
Reads the admitted_at column from the inflight row that the Tick pre-inserted.
- .record_adaptive_observations(policy:, gates:, partition_key:, admitted_at:, perform_start:, succeeded:) ⇒ Object
- .start_heartbeat(active_job_id) ⇒ Object
- .stop_heartbeat(heartbeat) ⇒ Object
- .track(job) ⇒ Object
Class Method Details
.lookup_admitted_at(active_job_id) ⇒ Object
Reads the admitted_at column from the inflight row that the Tick pre-inserted. Used as the start-of-queue-wait reference for the adaptive_concurrency feedback signal (queue_lag = perform_start
-
admitted_at). nil if the row vanished or the lookup fails —
the observation is then skipped.
75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/dispatch_policy/inflight_tracker.rb', line 75 def self.lookup_admitted_at(active_job_id) result = ActiveRecord::Base.connection.exec_query( "SELECT admitted_at FROM dispatch_policy_inflight_jobs WHERE active_job_id = $1 LIMIT 1", "lookup_admitted_at", [active_job_id] ) row = result.first return nil unless row ts = row["admitted_at"] ts.is_a?(Time) ? ts : Time.parse(ts.to_s) rescue StandardError nil end |
.record_adaptive_observations(policy:, gates:, partition_key:, admitted_at:, perform_start:, succeeded:) ⇒ Object
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
# File 'lib/dispatch_policy/inflight_tracker.rb', line 89 def self.record_adaptive_observations(policy:, gates:, partition_key:, admitted_at:, perform_start:, succeeded:) return if gates.empty? queue_lag_ms = if admitted_at ((perform_start - admitted_at) * 1000).to_i else # No admitted_at means we can't measure queue wait. Treat as 0 # so the observation still increments sample_count and the # cap can grow if everything else is healthy. 0 end gates.each do |gate| gate.record_observation( policy_name: policy.name, partition_key: partition_key, queue_lag_ms: queue_lag_ms, succeeded: succeeded ) rescue StandardError => e DispatchPolicy.config.logger&.warn( "[dispatch_policy] adaptive observation failed for #{policy.name}/#{partition_key}: #{e.class}: #{e.}" ) end end |
.start_heartbeat(active_job_id) ⇒ Object
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
# File 'lib/dispatch_policy/inflight_tracker.rb', line 121 def self.start_heartbeat(active_job_id) interval = DispatchPolicy.config.inflight_heartbeat_interval.to_f return nil if interval <= 0 stop_flag = Concurrent::AtomicBoolean.new(false) if defined?(Concurrent::AtomicBoolean) stop_flag ||= ThreadSafeFlag.new thread = Thread.new do Thread.current.name = "dispatch_policy.heartbeat:#{active_job_id}" until stop_flag.true? # Sleep in small slices so stop is responsive without polling tight. slept = 0.0 slice = [interval, 1.0].min while slept < interval && !stop_flag.true? sleep(slice) slept += slice end break if stop_flag.true? begin ActiveRecord::Base.connection_pool.with_connection do Repository.heartbeat_inflight!(active_job_id: active_job_id) end rescue StandardError => e DispatchPolicy.config.logger&.warn("[dispatch_policy] heartbeat #{active_job_id} failed: #{e.class}: #{e.}") end end end Heartbeat.new(thread, stop_flag) end |
.stop_heartbeat(heartbeat) ⇒ Object
154 155 156 157 158 159 160 161 162 163 164 |
# File 'lib/dispatch_policy/inflight_tracker.rb', line 154 def self.stop_heartbeat(heartbeat) return if heartbeat.nil? heartbeat.stop_flag.make_true # Wake the thread out of any in-progress sleep so we don't wait the full slice. heartbeat.thread.wakeup if heartbeat.thread.alive? heartbeat.thread.join(1.0) rescue StandardError # Worst case: the thread is killed by GC; the inflight row gets a stale # heartbeat_at and the sweeper will reclaim it after inflight_stale_after. end |
.track(job) ⇒ Object
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
# File 'lib/dispatch_policy/inflight_tracker.rb', line 24 def self.track(job) policy_name = job.class.respond_to?(:dispatch_policy_name) && job.class.dispatch_policy_name return yield unless policy_name policy = DispatchPolicy.registry.fetch(policy_name) return yield unless policy ctx = policy.build_context(job.arguments, queue_name: job.queue_name&.to_s) partition_key = policy.partition_key_for(ctx) Repository.insert_inflight!([{ policy_name: policy.name, partition_key: partition_key, active_job_id: job.job_id }]) adaptive_gates = policy.gates.select { |g| g.name == :adaptive_concurrency } admitted_at = adaptive_gates.any? ? lookup_admitted_at(job.job_id) : nil perform_start = Time.current heartbeat = start_heartbeat(job.job_id) succeeded = false begin yield succeeded = true ensure stop_heartbeat(heartbeat) record_adaptive_observations( policy: policy, gates: adaptive_gates, partition_key: partition_key, admitted_at: admitted_at, perform_start: perform_start, succeeded: succeeded ) begin Repository.delete_inflight!(active_job_id: job.job_id) rescue StandardError => e DispatchPolicy.config.logger&.warn("[dispatch_policy] failed to delete inflight row #{job.job_id}: #{e.class}: #{e.}") end end end |