sidekiq-batch-jobs
Batch tracking and completion callbacks for Sidekiq, backed by ActiveRecord (PostgreSQL).
A hand-rolled alternative to Sidekiq Pro batches. Group a set of perform_async calls into a batch, persist their state in the database, and fire a callback worker when the batch finishes — success or failure.
Installation
Add to your Gemfile:
gem "sidekiq-batch-jobs"
Then run the install generator and migrate:
bin/rails g sidekiq:batch:jobs:install
bin/rails db:migrate
The gem registers its client middleware, server middleware, and death handler automatically at boot.
To opt out (e.g. you need full control over middleware order), set this in config/application.rb before Rails boots:
Sidekiq::Batch::Jobs.auto_install = false
Then either call Sidekiq::Batch::Jobs.install! from your own config/initializers/sidekiq.rb, or wire the three pieces by hand:
# config/initializers/sidekiq.rb
Sidekiq.configure_client do |config|
# Outermost on the client chain so any dedupe/suppression middleware
# registered later with `chain.add` gets the final say on whether a
# push actually happens. We only enroll jobs that the chain agrees to push.
config.client_middleware do |chain|
chain.prepend SidekiqBatch::ClientMiddleware
end
end
Sidekiq.configure_server do |config|
# Same prepend for the server's own client chain — covers `perform_async`
# calls made from inside a running worker.
config.client_middleware do |chain|
chain.prepend SidekiqBatch::ClientMiddleware
end
# `add` (not `prepend`) so this runs LAST in the server chain — outside
# Sidekiq's retry middleware. That way we see the terminal disposition
# of each attempt and only mark the row failed on the final retry.
config.server_middleware do |chain|
chain.add SidekiqBatch::Middleware
end
# Reconciles jobs that died without re-entering middleware (SIGKILL, OOM,
# pod eviction). Without this, a killed job's batch row stays `pending`
# forever and the batch never completes.
config.death_handlers << ->(job, ex) { SidekiqBatch::Middleware.handle_death(job, ex) }
end
Usage
A batch is a group of Sidekiq jobs whose collective fate you care about. Once every job in the batch ends in a terminal state, a callback worker fires — exactly once. You typically use a batch when there's "fan-in" work to do after a bunch of independent jobs finish: rebuilding a derived dataset after rescoring, sending one summary email after a thousand individual notifications, marking an import as ready once all rows are processed.
The four steps
# 1. Create the batch. Use `context` to stash any state the callback will need —
# the callback only receives the batch id, not the surrounding closure, so
# anything you'd otherwise close over goes here.
batch = SidekiqBatch.create!(
description: "Rescore leaderboard #{leaderboard.id}",
context: {
"leaderboard_id" => leaderboard.id,
"triggered_by" => current_user.id,
"reason" => "manual rescore from admin panel",
},
)
# 2. Register what happens when the batch finishes
batch.on(:complete, RebuildLeaderboardCacheWorker)
batch.on(:failure, AlertOpsOfFailedRescoreWorker)
# 3. Enqueue jobs *inside* the batch context — they get enrolled automatically
batch.jobs do
leaderboard.entries.find_each do |entry|
ScoreWorker.perform_async(entry.id)
end
end
# 4. The batch is now running. Your callback worker will be invoked when
# the last job lands in a terminal state — possibly hours later, on a
# completely different worker process.
The context column
SidekiqBatch#context is a jsonb column for arbitrary per-batch metadata. The gem doesn't read it — it's a slot for you to pass information from the code that created the batch through to the callback worker, since the callback only receives batch_id and has to rehydrate everything else from the database.
Useful things to stash there:
- Foreign keys the callback needs (
leaderboard_id,import_id,tenant_id). - Provenance for debugging or audit (
triggered_by,via,request_id). - Configuration the callback should branch on (
notify_slack: true,recompute_strategy: "fast").
Keep it small and stable — it's metadata, not a payload. If you find yourself stuffing large arrays in there, that's a sign the data should live in its own table with a sidekiq_batch_id.
What batch.jobs do … end actually does
The block establishes a thread-local "enrollment context." While the block runs:
- Every
perform_async/perform_bulk/set(...).perform_asynccall made on this thread is intercepted by the client middleware. - For each pushed job, a
SidekiqBatchJobrow is written to Postgres with the job'sjid, worker class, and args — before Sidekiq pushes the payload to Redis. That ordering is the whole point: when the worker later runs and the server middleware looks for a matching row, it's guaranteed to find one. - When the block returns, the batch transitions from
pendingtorunningand the context is cleared. - Jobs enqueued outside the block are not enrolled and don't count toward this batch's completion. Same for jobs that child workers enqueue while running — only the original thread's enqueues are captured (by design — keeps the batch's scope predictable).
You don't have to think about jid tracking, race conditions, or middleware ordering — that's the gem's job. You just write the block.
What the callback worker receives
The callback gets one argument: the batch id. From there it can inspect the batch's full state:
class RebuildLeaderboardCacheWorker
include Sidekiq::Worker
def perform(batch_id)
batch = SidekiqBatch.find(batch_id)
leaderboard_id = batch.context.fetch("leaderboard_id")
Rails.logger.info "Batch #{batch.description} finished: #{batch.progress}"
# => Batch Rescore leaderboard 42 finished: {total: 1247, complete: 1247, failed: 0, pending: 0}
LeaderboardCacheRebuilder.run(leaderboard_id)
end
end
class AlertOpsOfFailedRescoreWorker
include Sidekiq::Worker
def perform(batch_id)
batch = SidekiqBatch.find(batch_id)
batch.failed_jobs.find_each do |bj|
Ops.notify(
"Rescore worker died: #{bj.worker_class} args=#{bj.args} " \
"error=#{bj.error_class}: #{bj.}",
)
end
end
end
Either :complete or :failure will fire — never both, never twice. :complete fires when every enrolled job succeeded; :failure fires the moment any job ends in a failed terminal state (retries exhausted, or retry: false workers that raised).
Inspecting a batch from anywhere
batch.progress
# => { total: 1247, complete: 1101, failed: 3, pending: 143 }
batch.pending_jobs # ActiveRecord relation
batch.failed_jobs
batch.completed_jobs
batch.status # "pending" | "running" | "complete" | "failed"
batch.completed_at # nil while running, set on terminal transition
batch.context # the jsonb hash you stashed when creating the batch
Development
After checking out the repo, run bin/setup to install gem dependencies. Then run the suite with:
bin/test # starts Postgres via docker compose, then runs rspec
bin/test spec/models # forwards args through to rspec
The test harness uses Combustion to boot a minimal Rails app under spec/internal/, and a Postgres container defined in docker-compose.yml. The container uses an ephemeral tmpfs for its data directory, so it's safe to docker compose down at any time.
If you'd rather skip the wrapper, the manual flow is:
docker compose up -d
bundle exec rspec
docker compose down
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/douglasgreyling/sidekiq-batch-jobs.