ChronoForge
A robust framework for building durable, distributed workflows in Ruby on Rails applications
ChronoForge handles long-running processes, manages state, and recovers from failures in your Rails applications. Built on ActiveJob, it keeps critical business processes resilient and traceable.
Workflows are plain Ruby. Ordinary if/else, loops, and early returns drive the flow. There's no declarative DSL to learn and no extra service to run, which makes ChronoForge a good fit for business processes whose shape depends on runtime state: conditional branches, iteration over data, and built-in periodic tasks (durably_repeat).
In production at achieve by Petra, an investment platform in the Petra Group β where it has executed over 3.6 million workflows and 32 million durable steps across scheduled payments, investment rollovers, and membership lifecycle management.
π§ Why ChronoForge
Most Rails workflow tools ask you to declare your steps up front in a DSL:
step :send_welcome_email
step :remind_of_tasks, wait: 2.days
step :complete_onboarding, wait: 15.days
That reads cleanly for a fixed, linear sequence. But many business processes branch, loop, and react to data that only exists at runtime, and a declarative schema gets awkward there. ChronoForge takes the opposite approach: a workflow is just a Ruby method. Conditionals, iteration, early returns, and helper methods all work the way they normally do.
There is a real trade-off. Because the flow is ordinary code, ChronoForge can show the steps that have run (a replay/history view), but not a roadmap of steps that haven't run yet, which a declarative engine can. For workflows whose path isn't fixed in advance, that's a trade worth making; for a simple, fixed sequence ("send email, wait 2 days, send another"), a declarative DSL may read more cleanly, and that's a fine reason to reach for one.
How it compares
| ChronoForge | GenevaDrive | AcidicJob | Temporal | |
|---|---|---|---|---|
| Programming model | procedural (plain Ruby) | declarative DSL | declarative DSL | procedural (via SDK) |
| Built-in periodic tasks | β durably_repeat |
β | β | β |
| Pending-step visibility | β (procedural) | β | β | β (procedural) |
| Extra infrastructure | none (DB + ActiveJob) | none | none | server required |
| License | MIT | LGPL / commercial | MIT | MIT |
Comparison reflects each project's documented features as of mid-2026, to the best of our knowledge; corrections welcome via PR.
A few deliberate choices behind that table:
- Periodic tasks are built in.
durably_repeatruns a step on a schedule until a condition holds, with automatic catch-up for missed runs, so a workflow can be its own recurring job and cron-style monitor, right alongside the rest of its logic. Without built-in support, periodic behavior usually lives in a separate scheduler that you reconcile with workflow state by hand. - No extra infrastructure. ChronoForge is a gem over your existing database and ActiveJob backend. There's no separate server or daemon to operate, unlike Temporal.
- Recovery is built into the model. Steps are append-only history, so a crashed step leaves the workflow
stalled, recoverable directly withretry_later. - MIT licensed. Permissive and dependency-policy-friendly.
π Features
- Plain-Ruby control flow: Branching, loops, and iteration over runtime data, without a DSL or step registry
- Durable Execution: Automatically tracks and recovers from failures during workflow execution
- Periodic tasks built in:
durably_repeatruns a step on an interval until a condition is met, with catch-up for missed runs. Acts as a recurring task and a cron-style monitor in one - Wait States: Time-based waits and condition-based waiting (
wait_until) that survive restarts - State Management: Built-in workflow state tracking with persistent context storage
- Concurrency Control: Advanced locking mechanisms to prevent parallel execution of the same workflow
- Error Handling: Error tracking with a unified, configurable
RetryPolicy(including per-error-type policies) - Execution Logging: Detailed logging of workflow steps and errors for visibility
- Database-Backed: All workflow state is persisted to ensure durability, with no extra services to run
- ActiveJob Integration: Compatible with all ActiveJob backends, though database-backed processors (like Solid Queue) provide the most reliable experience for long-running workflows
- Retention & Cleanup: A schedulable job to prune finished workflows and the unbounded logs that periodic tasks accumulate (see Cleanup & Retention)
π₯οΈ Dashboard
ChronoForge has a free, mountable dashboard for visibility and recovery: workflow list, step replay timeline, context inspector, periodic-task health, wait-state age, and retry/unlock actions. It ships as a separate gem, chrono_forge-dashboard, so the core stays lean.
# Gemfile
gem "chrono_forge-dashboard"
# config/routes.rb
mount ChronoForge::Dashboard::Engine, at: "/chrono_forge"
See chrono_forge-dashboard for setup, authentication, and more screenshots.
π¦ Installation
Add to your application's Gemfile:
gem 'chrono_forge'
Then execute:
$ bundle install
Or install directly:
$ gem install chrono_forge
After installation, run the generator to create the necessary database migrations:
$ rails generate chrono_forge:install
$ rails db:migrate
Upgrading
When upgrading ChronoForge in an application that was installed with an earlier version, run the upgrade generator to pick up any additive schema changes, then migrate:
$ rails generate chrono_forge:upgrade
$ rails db:migrate
The upgrade migration is idempotent (if_not_exists), so it is safe to run even
if your schema already has the index. Fresh installs get the index from the
install migration and do not need to run the upgrade.
π Usage
Creating and Executing Workflows
ChronoForge workflows are ActiveJob classes that prepend the ChronoForge::Executor module. Each workflow can only accept keyword arguments:
# Define your workflow class
class OrderProcessingWorkflow < ApplicationJob
prepend ChronoForge::Executor
def perform(order_id:, customer_id:)
# Workflow steps...
end
end
All workflows require a unique identifier when executed. This identifier is used to track and manage the workflow:
# Execute the workflow
OrderProcessingWorkflow.perform_later(
"order-123", # Unique workflow key
order_id: "order-134", # Custom kwargs
customer_id: "customer-456" # More custom kwargs
)
Basic Workflow Example
Here's a complete example of a durable order processing workflow:
class OrderProcessingWorkflow < ApplicationJob
prepend ChronoForge::Executor
def perform(order_id:)
@order_id = order_id
# Context can be used to pass and store data between executions
context.set_once "execution_id", SecureRandom.hex
# Wait until payment is confirmed
wait_until :payment_confirmed?
# Wait for potential fraud check
wait 1.minute, :fraud_check_delay
# Durably execute order processing
durably_execute :process_order
# Final steps
durably_execute :complete_order
end
private
def payment_confirmed?
PaymentService.confirmed?(@order_id, context["execution_id"])
end
def process_order
OrderProcessor.process(@order_id, context["execution_id"])
context["processed_at"] = Time.current.iso8601
end
def complete_order
OrderCompletionService.complete(@order_id, context["execution_id"])
context["completed_at"] = Time.current.iso8601
end
end
A workflow you can't flatten into a step list
The example above is linear, but most real processes aren't. Because a ChronoForge workflow is plain Ruby, branching and dynamic iteration are just⦠branching and iteration:
class OrderProcessingWorkflow < ApplicationJob
prepend ChronoForge::Executor
def perform(order_id:)
@order_id = order_id
wait_until :payment_confirmed?
durably_execute :validate_order
# Runtime branching: the path depends on data known only at execution time
if context["requires_compliance_check"]
durably_execute :run_compliance_review
wait_until :compliance_approved?, timeout: 48.hours
end
# Iterate over runtime data: one durable, idempotent step per item
context["line_item_ids"].each do |item_id|
context["current_item_id"] = item_id
durably_execute :fulfill_item, name: "fulfill_#{item_id}"
end
# Recurring notification: nudge the customer until they confirm delivery
durably_repeat :send_delivery_reminder, every: 3.days, till: :delivery_confirmed?
durably_execute :complete_order
end
private
def fulfill_item
FulfillmentService.fulfill(@order_id, context["current_item_id"])
end
def send_delivery_reminder
OrderMailer.delivery_reminder(@order_id).deliver_later
end
# ... other condition and step methods ...
end
Each durably_execute is checkpointed by its step name, so on resume the completed branches and items are skipped and the workflow continues where it left off. A fixed, declared list of steps can't easily express runtime branches, a loop over a runtime-sized collection, and an open-ended recurring notification.
Core Workflow Features
π Executing Workflows
ChronoForge workflows are executed through ActiveJob's standard interface with a specific parameter structure:
# Perform the workflow immediately
OrderProcessingWorkflow.perform_now(
"order-123", # Unique workflow key
order_id: "O-123", # Custom parameter
customer_id: "C-456" # Another custom parameter
)
# Or queue it for background processing
OrderProcessingWorkflow.perform_later(
"order-123-async", # Unique workflow key
order_id: "O-124",
customer_id: "C-457"
)
Important: Workflows must use keyword arguments only, not positional arguments.
β‘ Durable Execution
The durably_execute method runs an operation with automatic retries, and skips it on replay once it has completed:
# Basic execution
durably_execute :send_welcome_email
# With a custom retry policy
durably_execute :critical_payment_processing,
retry_policy: RetryPolicy.new(max_attempts: 5)
# With custom name for tracking multiple calls to same method
durably_execute :upload_file, name: "profile_image_upload"
# Complex example with error-prone operation
class FileProcessingWorkflow < ApplicationJob
prepend ChronoForge::Executor
def perform(file_id:)
@file_id = file_id
# This might fail due to network issues, rate limits, etc.
durably_execute :upload_to_s3, retry_policy: RetryPolicy.new(max_attempts: 5)
# Process file after successful upload
durably_execute :generate_thumbnails, retry_policy: RetryPolicy.new(max_attempts: 3)
end
private
def upload_to_s3
file = File.find(@file_id)
S3Client.upload(file.path, bucket: 'my-bucket')
Rails.logger.info "Successfully uploaded file #{@file_id} to S3"
end
def generate_thumbnails
ThumbnailService.generate(@file_id)
end
end
Key Features:
- Idempotent: Same operation won't be executed twice during replays
- Automatic Retries: Failed executions retry per a unified
RetryPolicy(exponential backoff with jitter; the step default caps at 30s over 3 attempts) - Error Tracking: All failures are logged with detailed error information
- Configurable: Pass a
retry_policy:per call, or set a class-wide default with theretry_policyDSL (see Retry Policies)
π Retry Policies
All retrying in ChronoForge goes through a single RetryPolicy (ChronoForge::Executor::RetryPolicy). It answers two questions: should this failure be retried? and how long until the next attempt?
RetryPolicy.new(
max_attempts: 3, # cap on total attempts; nil = no count cap (bounded elsewhere)
base: 1, # seconds; delay of the first retry
cap: 30, # seconds; ceiling for a single delay
jitter: true, # spread retries with equal jitter
retry_on: nil # nil = retry any StandardError; [Classes] = only those; [] = none
)
Backoff is exponential with equal jitter, computed once at re-enqueue time (never replayed, so it stays deterministic where it matters).
Resolution order:
durably_execute,durably_repeat, workflow-level errors: per-callretry_policy:β class-levelretry_policydefault β built-in default.wait_until: per-callretry_policy:β built-in default. It deliberately does not inherit the class default, so a class-wide "retry everything" can't silently turn condition-evaluation bugs into retried errors.
Built-in defaults:
| Site | Default | Why |
|---|---|---|
Steps (durably_execute/durably_repeat) |
3 attempts, cap 30s, retry any error | flaky calls fail fast |
| Workflow-level (uncaught errors) | 10 attempts, cap 600s, retry any error | tolerant window up to ~8.5 min (β4 min typical w/ jitter) for transient infra errors; each retry replays the whole workflow from the top |
wait_until condition errors |
retry nothing | a raised condition is usually a bug, not transient |
Class-wide default via the retry_policy DSL:
class ChargeWorkflow < ApplicationJob
prepend ChronoForge::Executor
retry_policy max_attempts: 5, base: 2, cap: 60 # applies to steps + workflow-level
def perform
durably_execute :charge,
retry_policy: RetryPolicy.new(max_attempts: 8, retry_on: [Net::OpenTimeout])
wait_until :settled?,
retry_policy: RetryPolicy.new(retry_on: [BankApiError])
end
end
Composite policies (per-error budgets):
Pass an array of policies to handle different error types differently. On a failure, the first policy whose retry_on matches the raised error applies, and each error type gets its own attempt budget and backoff:
durably_execute :charge_card, retry_policy: [
RetryPolicy.new(retry_on: [NetworkError], max_attempts: 5), # transient: retry hard
RetryPolicy.new(retry_on: [RateLimitError], max_attempts: 10, base: 5), # back off longer
RetryPolicy.new(retry_on: [PaymentDeclinedError], max_attempts: 1), # fail fast, never retry
RetryPolicy.new(retry_on: nil) # catch-all (optional), keep last
]
- Order matters: the first matching policy wins, so list specific errors first and a catch-all (
retry_on: nil) last. An error matched by no policy is not retried (fails fast). - A subclass of a listed error routes to that policy and draws from its budget.
- Per-error counts are tracked by the policy's declared errors, so the budgets are stable even if you reorder the list.
- The class-level DSL accepts the same form as positional arguments (applies to steps and workflow-level errors):
retry_policy RetryPolicy.new(retry_on: [NetworkError], max_attempts: 5),
RetryPolicy.new(retry_on: nil, max_attempts: 2)
β±οΈ Wait States
ChronoForge supports three types of wait states, each optimized for different use cases:
1. Time-based Waits (wait)
For simple delays and scheduled pauses:
# Simple delays
wait 30.minutes, "cooling_period"
wait 1.day, "daily_batch_interval"
# Complex workflow with multiple waits
def user_onboarding_flow
durably_execute :send_welcome_email
wait 1.hour, "welcome_delay"
durably_execute :send_tutorial_email
wait 2.days, "tutorial_followup"
durably_execute :send_feedback_request
end
2. Automated Condition Waits (wait_until)
For conditions that can be automatically polled at regular intervals:
# Wait for external API
wait_until :external_api_ready?,
timeout: 30.minutes,
check_interval: 1.minute
# Wait with retry on specific errors raised while evaluating the condition
wait_until :database_migration_complete?,
timeout: 2.hours,
check_interval: 30.seconds,
retry_policy: RetryPolicy.new(retry_on: [ActiveRecord::ConnectionNotEstablished, Net::TimeoutError])
# Complex condition example
def third_party_service_ready?
response = HTTParty.get("https://api.example.com/health")
response.code == 200 && response.body.include?("healthy")
end
wait_until :third_party_service_ready?,
timeout: 1.hour,
check_interval: 2.minutes,
retry_policy: RetryPolicy.new(retry_on: [Net::TimeoutError, Net::HTTPClientException])
3. Event-driven Waits (continue_if)
For conditions that depend on external events like webhooks, requiring manual workflow continuation:
# Basic usage - wait for webhook-driven state change
continue_if :payment_confirmed?
# With custom name for better tracking
continue_if :payment_confirmed?, name: "stripe_webhook"
# Wait for manual approval
continue_if :document_approved?
# Wait for external file processing
continue_if :processing_complete?
# Multiple waits with same condition but different contexts
continue_if :external_system_ready?, name: "payment_gateway"
# ... other steps ...
continue_if :external_system_ready?, name: "inventory_system"
# Complete workflow example
class PaymentWorkflow < ApplicationJob
prepend ChronoForge::Executor
def perform(order_id:)
@order_id = order_id
# Initialize payment
durably_execute :create_payment_request
# Wait for external payment confirmation (webhook-driven)
continue_if :payment_confirmed?, name: "stripe_confirmation"
# Complete order after payment
durably_execute :fulfill_order
end
private
def payment_confirmed?
PaymentService.confirmed?(@order_id)
end
end
# Later, when webhook arrives:
PaymentService.mark_confirmed(order_id)
PaymentWorkflow.perform_later("order-#{order_id}", order_id: order_id)
When to Use Each Wait Type:
| Wait Type | Use Case | Polling | Resource Usage | Response Time |
|---|---|---|---|---|
wait |
Fixed delays, rate limiting | None | Minimal | Exact timing |
wait_until |
API readiness, data processing | Automatic | Medium | Check interval |
continue_if |
Webhooks, user actions, file uploads | Manual only | Minimal | Immediate |
Key Differences:
wait: Time-based, no condition checking, resumes automaticallywait_until: Condition-based with automatic polling, resumes when condition becomes true or timeoutcontinue_if: Condition-based without polling, requires manual workflow retry when condition might have changed
π Periodic Tasks
durably_repeat runs periodic tasks inside a workflow. A task is scheduled at a regular interval until a condition is met, with automatic catch-up for missed executions and configurable error handling.
class NotificationWorkflow < ApplicationJob
prepend ChronoForge::Executor
def perform(user_id:)
@user_id = user_id
# Send reminders every 3 days until user completes onboarding
durably_repeat :send_reminder_email,
every: 3.days,
till: :user_onboarded?
# Critical payment processing every hour - fail workflow if it fails
durably_repeat :process_pending_payments,
every: 1.hour,
till: :all_payments_processed?,
on_error: :fail_workflow
end
private
def send_reminder_email(scheduled_time = nil)
# Optional parameter receives the scheduled execution time
if scheduled_time
lateness = Time.current - scheduled_time
Rails.logger.info "Reminder scheduled for #{scheduled_time}, running #{lateness.to_i}s late"
end
UserMailer.onboarding_reminder(@user_id).deliver_now
end
def user_onboarded?
User.find(@user_id).onboarded?
end
def process_pending_payments
PaymentProcessor.process_pending_for_user(@user_id)
end
def all_payments_processed?
Payment.where(user_id: @user_id, status: :pending).empty?
end
end
Key Features:
- Idempotent Execution: Each repetition gets a unique execution log, preventing duplicates during replays
- Automatic Catch-up: Missed executions due to downtime are automatically skipped using timeout-based fast-forwarding
- Custom Timing: Custom start times and precise interval scheduling
- Error Resilience: Individual execution failures don't break the periodic schedule
- Configurable Error Handling: Choose between continuing despite failures or failing the entire workflow
Advanced Options:
durably_repeat :generate_daily_report,
every: 1.day, # Execution interval
till: :reports_complete?, # Stop condition
start_at: Date.tomorrow.beginning_of_day, # Custom start time (optional)
retry_policy: RetryPolicy.new(max_attempts: 5), # Retry policy per execution (default: step_default)
timeout: 2.hours, # Catch-up timeout (default: 1.hour)
on_error: :fail_workflow, # Error handling (:continue or :fail_workflow)
name: "daily_reports" # Custom task name (optional)
Method Parameters:
Your periodic methods can optionally receive the scheduled execution time as their first argument:
# Without scheduled time parameter
def cleanup_files
FileCleanupService.perform
end
# With scheduled time parameter
def cleanup_files(scheduled_time)
# Use scheduled time for business logic
cleanup_date = scheduled_time.to_date
FileCleanupService.perform(date: cleanup_date)
# Log timing information
delay = Time.current - scheduled_time
Rails.logger.info "Cleanup was #{delay.to_i} seconds late"
end
π Workflow Context
ChronoForge provides a persistent context that survives job restarts. The context behaves like a Hash but with additional capabilities:
# Set context values
context[:user_name] = "John Doe"
context[:status] = "processing"
# Read context values
user_name = context[:user_name]
# Using the fetch method (returns default if key doesn't exist)
status = context.fetch(:status, "pending")
# Set a value with the set method (alias for []=)
context.set(:total_amount, 99.99)
# Set a value only if the key doesn't already exist
context.set_once(:created_at, Time.current.iso8601)
# Check if a key exists
if context.key?(:user_id)
# Do something with the user ID
end
The context supports serializable Ruby objects (Hash, Array, String, Integer, Float, Boolean, and nil) and validates types automatically.
Hash and Array values are stored as JSON, which has no symbols, so symbol keys inside a stored hash come back as strings:
context[:totals] = { paid: 5, pending: 2 }
context[:totals] # => { "paid" => 5, "pending" => 2 }
context[:totals]["paid"] # => 5 (not context[:totals][:paid])
(The top-level context key itself is interchangeable: context[:totals] and context["totals"] refer to the same entry.)
Context is meant for small working state: ids, flags, timestamps, and small structures used to coordinate steps. Each value is capped at 16 KB (a ChronoForge::Executor::Context::ValidationError is raised above that). Store large payloads (documents, uploads, API responses) in their own storage and keep just a reference (an id or key) in the context.
π‘οΈ Error Handling
ChronoForge automatically tracks errors and routes all retrying through a single RetryPolicy. Configure it per call with retry_policy:, or set a class-wide default with the retry_policy DSL:
class MyWorkflow < ApplicationJob
prepend ChronoForge::Executor
# Class-wide default for workflow-level errors and steps without an override
retry_policy max_attempts: 5, base: 2, cap: 60
def perform
# Retry only network errors, up to 5 times, for this step
durably_execute :call_external_api,
retry_policy: RetryPolicy.new(max_attempts: 5, retry_on: [NetworkError])
end
end
To make an error non-retryable, leave it out of retry_on: (an empty retry_on: [] retries nothing).
π§ͺ Testing
ChronoForge is designed to be easily testable using ChaoticJob, a testing framework that makes it simple to test complex job workflows:
- Add ChaoticJob to your Gemfile's test group:
group :test do
gem 'chaotic_job'
end
- Set up your test helper:
# test_helper.rb
require 'chrono_forge'
require 'minitest/autorun'
require 'chaotic_job'
Example test:
class WorkflowTest < ActiveJob::TestCase
include ChaoticJob::Helpers
def test_workflow_completion
# Enqueue the job with a unique key and custom parameters
OrderProcessingWorkflow.perform_later(
"order-test-123",
order_id: "O-123",
customer_id: "C-456"
)
# Perform all enqueued jobs
perform_all_jobs
# Assert workflow completed successfully
workflow = ChronoForge::Workflow.find_by(key: "order-test-123")
assert workflow.completed?
# Check workflow context
assert workflow.context["processed_at"].present?
assert workflow.context["completed_at"].present?
end
end
ποΈ Database Schema
ChronoForge creates three main tables:
- chrono_forge_workflows: Stores workflow state and context
- chrono_forge_execution_logs: Tracks individual execution steps
- chrono_forge_error_logs: Records detailed error information
π When to Use ChronoForge
ChronoForge is ideal for:
- Long-running business processes - Order processing, account registration flows
- Processes requiring durability - Financial transactions, data migrations
- Multi-step workflows - Onboarding flows, approval processes, multi-stage jobs
- State machines with time-based transitions - Document approval, subscription lifecycle
π§ Advanced State Management
ChronoForge workflows move through a state machine. Understanding these states and transitions helps with troubleshooting and recovery.
Workflow State Diagram
stateDiagram-v2
[*] --> created: Workflow Created
created --> idle: Initial State
idle --> running: Job Started
running --> idle: Waiting
running --> completed: All Steps Completed
running --> failed: Max Retries Exhausted
running --> stalled: Unrecoverable Error
idle --> running: Resumed
stalled --> [*]: Requires Manual Intervention
failed --> [*]: Requires Manual Intervention
completed --> [*]: Workflow Succeeded
State Descriptions
Created
- Description: Initial state when a workflow record is first created
- Behavior: Transitions immediately to idle state
- Duration: Momentary
Idle
- Description: The workflow is waiting to be processed or between processing steps
- Behavior: Not locked, available to be picked up by job processor
- Duration: Can be minutes to days, depending on wait conditions
Running
- Description: The workflow is actively being processed
- Identifiers: Has locked_at and locked_by values set
- Behavior: Protected against concurrent execution
- Duration: Should be brief unless performing long operations
Completed
- Description: The workflow has successfully executed all steps
- Identifiers: Has completed_at timestamp, state = "completed"
- Behavior: Final state, no further processing
- Typical Exit Points: All processing completed successfully
Failed
- Description: The workflow has failed after exhausting retry attempts
- Identifiers: Has failure-related data in error_logs, state = "failed"
- Behavior: No automatic recovery, requires manual intervention
- Typical Exit Points: Max retries exhausted, explicit failure, non-retryable error
Stalled
- Description: The workflow encountered an unrecoverable error but wasn't explicitly failed
- Identifiers: Not completed, not running, has errors in error_logs
- Behavior: Requires manual investigation and intervention
- Typical Exit Points: ExecutionFailedError, unexpected exceptions, system failures
Handling Different Workflow States
Recovering Stalled/Failed Workflows
Re-execute a failed or stalled workflow directly from its record. Execution resumes via replay, so completed steps are skipped and it picks up at the step that failed:
workflow = ChronoForge::Workflow.find_by(key: "order-123")
workflow.retry_later # re-run asynchronously (the common case)
workflow.retry_now # re-run inline (console/debugging)
Only stalled or failed workflows are retryable. retryable? lets you check
first, and both methods validate up front: calling retry_later
on a non-retryable workflow raises ChronoForge::Executor::WorkflowNotRetryableError
immediately rather than enqueuing a job that would fail in the worker:
workflow.retryable? # => true/false
# Bulk re-run everything that failed:
ChronoForge::Workflow.failed.find_each(&:retry_later)
The class-level form (MyWorkflow.retry_now(key) / retry_later(key)) still
works if you have the class and key rather than the record.
Monitoring Running Workflows
Long-running workflows might indicate issues:
# Find workflows running for too long
long_running = ChronoForge::Workflow.where(state: :running)
.where('locked_at < ?', 30.minutes.ago)
long_running.each do |workflow|
# Log potential issues for investigation
Rails.logger.warn "Workflow #{workflow.key} has been running for >30 minutes"
# Optionally force unlock if you suspect deadlock
# CAUTION: Only do this if you're certain the job is stuck
# workflow.update!(locked_at: nil, locked_by: nil, state: :idle)
end
π§Ή Cleanup & Retention
ChronoForge keeps every workflow and execution-log row indefinitely so that replays remain idempotent. Over time two things grow without bound:
- Terminal workflows (
completed/failed) that are no longer needed. durably_repeatrepetition logs: one row per scheduled execution. A long-lived periodic workflow never reaches a terminal state, so its repetition logs accumulate indefinitely. Past repetitions (those behind the task's current frontier) are never read again, since each resume recomputes the next execution from the coordination log, so they are safe to prune (see the safety note below).
ChronoForge::Cleanup reclaims both. It is not run automatically; schedule
it from your own scheduler so you stay in control of retention:
ChronoForge::Cleanup.run(
older_than: 90.days, # default retention for terminal workflows (+ cascades their logs)
completed_older_than: 30.days, # optional: retention for completed workflows (defaults to older_than)
failed_older_than: 180.days, # optional: keep failures longer for debugging (defaults to older_than)
prune_repetition_logs_older_than: 30.days, # opt-in: prune old durably_repeat logs from still-active workflows
batch_size: 1_000 # rows deleted per batch
)
# => { workflows: 12, execution_logs: 84, error_logs: 3, repetition_logs: 240 }
Notes:
running,idle, andstalledworkflows are never deleted.completed_older_than/failed_older_thanlet you keep failed workflows around longer than completed ones; both default toolder_than.prune_repetition_logs_older_thanis opt-in (defaults tonil); when unset, repetition logs are only removed as part of deleting their parent workflow. Pruning is deliberately conservative: it only removes terminal repetition logs that are both older than the window and scheduled strictly before the periodic task's current frontier (the coordination log'slast_execution_at). Anything at or after the frontier is kept sodurably_repeat's catch-up mechanism is never disrupted, so the window is purely a retention preference and is safe even for yearly schedules.- Workflow retention is measured from when a workflow became terminal, not when
it was created. A long-running workflow that only just finished is kept for
the full window. Completed workflows use
completed_at(immutable); failed workflows useupdated_at(they have nocompleted_at). - The composite
[state, completed_at]index added in this version keeps these scans efficient; runchrono_forge:upgradeif you installed an earlier version.
A ready-made job is bundled so you can schedule it with any recurring-job
mechanism (Solid Queue recurring tasks, sidekiq-cron, GoodJob cron, the
whenever gem, ...):
ChronoForge::CleanupJob.perform_later(
older_than_days: 90,
failed_older_than_days: 180,
prune_repetition_logs_older_than_days: 30
)
The job takes plain day counts (not Duration objects) so it can be driven from
a config file. For example, with Solid Queue's recurring tasks
(config/recurring.yml):
production:
chrono_forge_cleanup:
class: ChronoForge::CleanupJob
args: { older_than_days: 90, prune_repetition_logs_older_than_days: 30 }
schedule: every day at 3am
πΏ Branches: parallel sub-workflows
branch / spawn / spawn_each / merge_branches let a workflow fan out into
child workflows that run concurrently, then join them when their results are
needed.
Model
branch :name do β¦ endopens a named branch (a durable step). Inside the block,spawnandspawn_eachcreate and immediately enqueue child workflows β children start running as soon as the branch block is entered.spawn :name, WorkflowClass, **kwargsβ enqueues one child workflow.spawn_each :name, source do |item| [WorkflowClass, kwargs] endβ enqueues one child per item. The block returns the class and kwargs, so one branch can fan out into mixed workflow types. Sources are iterated in constant memory; ActiveRecord relations are streamed by primary key β pass them without an explicit.order.automerge: trueβ joins the branch inline at the block's close. Execution does not continue past thebranchcall until every child has completed. Use it for "dispatch this group and wait right here."merge_branches :a, :b(or the singular aliasmerge_branch :a) β the separate join point. Open branches withoutautomerge, do other work while the children run, then join when you need their results.merge_branchesblocks until all named branches are complete.
Worked example
class FulfillmentWorkflow < ApplicationJob
prepend ChronoForge::Executor
def perform(cycle_id:)
# automerge: the branch is joined inline, right where the block closes β
# `perform` does not continue past it until every child has completed.
branch :reconcile, automerge: true do
spawn :eu, ReconcileWorkflow, region: "EU"
spawn_each :orders, Order.pending do |order|
order.priority? ? [PriorityOrderWorkflow, { order_id: order.id }]
: [OrderWorkflow, { order_id: order.id }]
end
end
# For branches you want to run concurrently and join later, omit automerge
# and use merge_branches:
branch :invoices do
spawn_each :unpaid, Invoice.unpaid do |inv|
[InvoiceWorkflow, { invoice_id: inv.id }]
end
end
branch :shipments do
spawn_each :ready, Shipment.ready do |s|
[ShipmentWorkflow, { shipment_id: s.id }]
end
end
do_other_work # runs while :invoices and :shipments dispatch/run
merge_branches :invoices, :shipments # join both here
durably_execute :finalize
end
end
Caveats
Every branch must be joined. A branch opened and never joined raises
ChronoForge::Executor::UnmergedBranchErrorwhen the workflow tries to complete β fail-fast, no silently-orphaned children. Use eitherautomerge: trueor a matchingmerge_branchescall.The parent isn't replayed while waiting. A lightweight
ChronoForge::BranchMergeJobpolls for child completion; the parent workflow only runs again once the branch is fully done. Polling cadence adapts to how many children remain.
spawn_eachsources must re-enumerate deterministically across replays. ActiveRecord relations are streamed by primary key (children are keyed by record id, so crash-resume is idempotent); a relation carrying an explicit.order(...)raises. For non-AR enumerables, items are keyed by position, so inserting or removing items mid-dispatch would shift keys and break idempotency.
spawn_eachAR sources must have stable membership. Dispatch streams by ascending primary key and resumes from the last key on crash-recovery, so a row that enters the relation below the cursor after it has passed (e.g. awhere(state: β¦)scope whose rows mutate mid-dispatch) will never get a child. Pointspawn_eachat a set that is fixed for the branch's lifetime β a frozen id range, an append-only table, orwhere(id: [...])over a snapshot.
branchblocks cannot be lexically nested within one workflow. Opening abranchinside anotherbranchblock raisesArgumentError; spawns belong to exactly one branch. (A spawned child workflow may open its own branches β it runs in its own executor β so cross-workflow nesting is fine.)
π Development
After checking out the repo, run:
$ bin/setup # Install dependencies
$ bundle exec rake test # Run the tests
$ bin/appraise # Run the full suite of appraisals
$ bin/console # Start an interactive console
The test suite uses SQLite by default and includes:
- Unit tests for core functionality
- Integration tests with ActiveJob
- Example workflow implementations
π₯ Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin feature/my-new-feature) - Create a new Pull Request
Please include tests for any new features or bug fixes.
π License
This gem is available as open source under the terms of the MIT License.
π API Reference
Core Workflow Methods
| Method | Purpose | Key Parameters |
|---|---|---|
durably_execute |
Execute method with retry logic | method, retry_policy: nil, name: nil |
wait |
Time-based pause | duration, name |
wait_until |
Condition-based waiting | condition, timeout: 1.hour, check_interval: 15.minutes, retry_policy: nil |
continue_if |
Manual continuation wait | condition, name: nil |
durably_repeat |
Periodic task execution | method, every:, till:, start_at: nil, retry_policy: nil, timeout: 1.hour, on_error: :continue |
Context Methods
| Method | Purpose | Example |
|---|---|---|
context[:key] = value |
Set context value | context[:user_id] = 123 |
context[:key] |
Get context value | user_id = context[:user_id] |
context.set(key, value) |
Set context value (alias) | context.set(:status, "active") |
context.set_once(key, value) |
Set only if key doesn't exist | context.set_once(:created_at, Time.current) |
context.fetch(key, default) |
Get with default value | context.fetch(:count, 0) |
context.key?(key) |
Check if key exists | context.key?(:user_id) |
