Ractor::Wrapper — Design Document
This document describes how Ractor::Wrapper is implemented. It is intended
for three audiences:
- Power users who want to understand how the gem behaves well enough to make informed decisions about incorporating it into a system.
- Contributors who need to understand the internals before changing them.
- Future maintainers who want a permanent record of the architectural decisions baked into the library.
The README.md focuses on how to use the gem. In contrast, this document
focuses on how it works and why it was built that way.
Note: This document was largely reverse-engineered from the code and its reference documentation by Claude Opus 4.7, with some human edits.
1. Problem statement
A Ractor-shareable object is one that can be referenced from more than one
Ractor at once. The current Ruby rules (as of 4.0) require shareable objects
to be deeply immutable, which rules out a large fraction of Ruby objects
that Ruby programs actually need to share. Examples include Net::HTTP
sessions, database connections, file handles, cache objects, and parsers
with internal state.
The canonical Ractor answer is "don't share it; write an actor." But writing an actor from scratch means hand-crafting a message loop, a serialization convention, block handling, lifecycle, error propagation, and everything else. It also means rewriting every existing library in that style.
Ractor::Wrapper provides this plumbing once, so that any ordinary Ruby
object can be exposed to other Ractors through a shareable stub that
proxies calls back to a single controlled home. This allows both legacy
non-shareable objects to be used in a multi-Ractor environment, and new
Ractor-aware objects to be written easily.
The core design goal is to reproduce, as faithfully as reasonable, the semantics of calling the object directly, including arbitrary arguments and return values, keyword arguments, blocks, exceptions, and even re-entrant calls from blocks. The gem deliberately accepts some limitations in service of that goal, and those limitations are discussed throughout this document.
2. High-level structure
The entire library is in lib/ractor/wrapper.rb. The classes involved are:
-
Ractor::Wrapper— the Ractor-shareable public-facing handle. Holds theConfiguration, aStub, aRactor::Portto send messages to the server. It also provides thecallmethod that provides the central interface to invoke methods on the wrapped object, the main caller-side message loop and block-yield handler, and wrapper lifecycle operations (async_stop,join,recover_object). -
Ractor::Wrapper::Stub— a Ractor-shareable proxy that mimics the method interface of the original object. Usesmethod_missingto forward arbitrary method calls to the wrapper. -
Ractor::Wrapper::Configuration— builder for wrapper options, including per-method settings. -
Ractor::Wrapper::MethodSettings— a frozen value object holding the copy/move/void choices for a single method. -
Ractor::Wrapper::Server— the backend that runs the object and services messages. In isolated mode it runs in its own Ractor; in local mode it runs as one or more threads inside the Ractor that created the wrapper. -
Ractor::Wrapper::Server::Dispatcher— a thread-safe work distributor used only in threaded mode. Handles routing of new calls (via a shared queue) and fiber resumes (via per-worker queues). - Message types (all
Data.definefrozen structs):InitMessage,CallMessage,ReturnMessage,ExceptionMessage,FiberYieldMessage,BlockingYieldMessage,FiberReturnMessage,FiberExceptionMessage,StopMessage,JoinMessage,JoinReplyMessage,WorkerStoppedMessage.
Viewed from the outside, the runtime architecture looks like this:
Caller Ractor(s) Wrapper Wrapped object
┌──────────────────┐ ┌───────────────────────┐ ┌────────────┐
│ stub.some_method │───────▶│ Wrapper#call │ │ <object> │
│ │ (Stub) │ sends CallMessage │ │ │
│ │ └─────────┬─────────────┘ └──────▲─────┘
│ │ │ @port │
│ │ ┌─────────▼──────────────┐ │
│ │ │ Server main fiber / │ fibers / │
│ ◀── ReturnMsg ───│────────│ worker threads │ threads ◀─────┘
│ ◀── ExceptionMsg │ │ (dispatch, fiber mgmt) │
│ ◀── YieldMsg ────│──▶ run block ─▶ send result ─▶ │
└──────────────────┘ └────────────────────────┘
The rest of this document expands each of these boxes.
3. Configuration
Configuration is a two-stage process: keyword arguments to Wrapper.new,
optionally followed by a block that yields a mutable Configuration instance.
The block runs before the wrapper materializes its server, so it is the last
chance to adjust behavior before the wrapper is frozen for Ractor shareability.
Inside Wrapper#initialize the sequence is:
- Validate that the supplied object is not already a moved
Ractor::MovedObject. - Build a fresh
Configuration, seeding it with constructor kwargs. - Yield it to the user's block, if any. Block-provided settings override kwargs because they are applied later.
- Resolve
Configuration#final_method_settingsinto a frozen hash. - Create the
Stub, freeze the wrapper, and start the server.
3.1 Wrapper-level settings
| Setting | Type | Default | Purpose |
|---|---|---|---|
name |
String |
object_id.to_s |
Identifies the wrapper in log output and as the Ractor name. |
use_current_ractor |
Boolean |
false |
Selects the execution mode (see §4). |
threads |
Integer |
0 |
Number of worker threads. 0 means sequential. |
enable_logging |
Boolean |
false |
Enables internal stderr logging. |
3.2 Method-level settings — MethodSettings
MethodSettings governs how a single method communicates its data. Each
instance is frozen and carries five values:
| Field | Values | Meaning |
|---|---|---|
arguments |
:copy \ |
:move |
results |
:copy \ |
:move \ |
block_arguments |
:copy \ |
:move |
block_results |
:copy \ |
:move \ |
block_environment |
:caller \ |
:wrapped |
The settings are stored in Configuration#@method_settings, keyed by method
name (Symbol) or nil for the defaults. final_method_settings produces the
resolved hash by:
- Starting with a hard-coded fallback of all
:copyandblock_environment: :caller. - Overlaying the
nilentry (defaults the user supplied or got from kwargs) on top of that fallback. - For each named-method entry, overlaying it on top of those defaults.
Looking a method up via Wrapper#method_settings(name) returns the
per-method MethodSettings if present, otherwise the defaults. Because
MethodSettings is frozen and the outer hash is frozen, this value is
safely shared between the caller's Ractor and the server's Ractor — it
travels inside each CallMessage.
3.3 Why the config travels with every message
Embedding settings inside CallMessage is intentional. The server
needs to know, per call: whether to move or copy the return value,
whether to expect to run a block locally or to yield one, and whether to
void the return. Sending the settings with the message keeps the server
stateless with respect to configuration — it does not need a synchronized
copy of the settings hash, and configuration changes (were they ever
added) would not race with in-flight calls.
3.4 Copy / move / void — what they really mean
-
:copy— the value isMarshal-style deep-cloned by Ractor when it crosses the boundary. Safe but potentially expensive for large payloads. -
:move— the value is transferred; the sender can no longer use it. This is sent as themove:keyword ofRactor::Port#send. Useful for large buffers and unique resources. The ownership model is subtle: if the caller passes a large string as a:moveargument, it becomes aRactor::MovedObjectfor the caller and must not be touched again. -
:void— the return value (or block result) is dropped and the recipient seesnil. This exists because many Ruby methods return a value "by accident"; they don't intend to return a specific value, but whatever final object is evaluated at the end of the method still gets functionally returned. If that object is large, the cost of shipping it could be both substantial and unnecessary::copymight be expensive, and:movecould be disastrous, making parts of the internal state unreachable.:voidis an escape hatch that simply disables returning any value in such cases.
4. Execution modes
There are two orthogonal axes of execution mode, giving four combinations:
Sequential (threads: 0) |
Threaded (threads: N>0) |
|
|---|---|---|
Isolated (use_current_ractor: false) |
Object moved to a new Ractor; one method at a time. | Object moved to a new Ractor; N worker threads inside it. |
Local (use_current_ractor: true) |
Object stays in the current Ractor; one thread serves calls. | Object stays in the current Ractor; N worker threads serve calls. |
4.1 Isolated mode
Wrapper#setup_isolated_server spawns a new Ractor that immediately
calls Server.run_isolated. The object is not passed as a Ractor
constructor argument, as doing so would copy the object. Instead, once the
Ractor is running, the wrapper sends a InitMessage (containing the object
and the stub) with move: true, and the server's first action inside
receive_remote_object is to receive and unpack it.
Because the object now lives in the server's Ractor, the original caller
can no longer touch it directly. It can, however, retrieve the object at
the end of the wrapper's life via Wrapper#recover_object, which is
implemented as @ractor.value — the Ractor's terminal value is the
wrapped object, returned from Server#run.
4.2 Local mode (use_current_ractor: true)
Wrapper#setup_local_server does not spawn a Ractor at all. It creates
a Ractor::Port, marks the wrapper frozen (so the stub is shareable),
and starts a regular Thread that runs Server.run_local. The wrapped
object is never moved; it stays with its creator.
This is the right mode for objects that cannot be moved between Ractors. The canonical example is a SQLite3 database handle, which is bound to the Ractor that created it. It's also appropriate when you want to keep direct access to the object from the creating Ractor (say, for quick synchronous probes that avoid the wrapper's message path entirely), which is only safe to do outside method windows driven by the wrapper.
Tradeoffs vs isolated mode:
- No
recover_object(the object was never moved;recover_objectraises). - No isolation: a bug in the wrapped object can corrupt state in the host Ractor, since they share a Ractor.
- Slightly lower overhead per call: no cross-Ractor marshalling of arguments / return values if both caller and server happen to be in the same Ractor (but note that in local mode other Ractors can still call through the stub, and those calls pay the normal crossing cost).
4.3 Sequential vs threaded
threads: 0 (sequential) means no Dispatcher is created and calls are
executed directly by the server's main message-handling loop. One method
runs at a time.
threads: N > 0 creates a Dispatcher and spawns N worker threads.
Workers pull CallMessages off the dispatcher's shared queue and execute
them concurrently. The concurrency ceiling is N regardless of how many
callers are blocked on calls.
A sharp note from the constructor doc: the threads value should be
sized to the concurrency of independent calls, not to the re-entrancy
depth. A suspended method (waiting for a block result) does not occupy a
worker — it only occupies a fiber. The worker returns to the dispatch
loop and can service another call. This is why the threading model costs
very little even for deeply re-entrant workloads: fibers are cheap,
threads are not, and the library deliberately spends the former.
5. The Stub
Ractor::Wrapper::Stub is minimal by design:
Stub
@wrapper (frozen reference to Wrapper)
method_missing(name, ...) → @wrapper.call(name, ...)
respond_to_missing?(name, include_all) → @wrapper.call(:respond_to?, ...)
It freezes itself in initialize. Because its only instance variable is
a frozen reference to a frozen Wrapper, and the Wrapper is itself
shareable after construction, the stub is transitively shareable. That
means you can pass it freely across Ractor boundaries and every Ractor
can call methods on the wrapped object through it.
A few design choices worth calling out:
- Why
method_missinginstead of pre-generating methods? The stub must work for any wrapped object, including ones that define methods dynamically at runtime. There is no way to introspect the wrapped object's method list from another Ractor without paying a message round-trip anyway. -
respond_to_missing?proxies through. If you ask the stub whether the underlying object responds to:foo, the answer has to come from the server, because only it can see the object. Sorespond_to?is itself a round-trip call. This is slower than a direct lookup but semantically faithful. - Return-value substitution. If the wrapped object returns
self, the server substitutes the stub for the return value. This preserves method chaining through the stub boundary —stub.tap { ... }works as expected — even thoughselffrom the server's perspective is the bare object, not the stub. The same substitution is performed for block arguments (see §7). - No
callmethod.Wrapper#callis the low-level escape hatch; the stub always goes throughmethod_missing. This means you cannot usestub.call(:foo)to bypass the proxy.
6. Messaging protocol
All messages are frozen Data structs, making them shareable and immutable.
This section catalogs them by direction.
6.1 From caller to server
| Message | Sender | Receiver | Purpose |
|---|---|---|---|
InitMessage(object, stub) |
Wrapper#setup_isolated_server |
Server's receive_remote_object |
One-shot initialization for isolated mode. Sent with move: true. |
CallMessage(method_name, args, kwargs, block_arg, transaction, settings, reply_port) |
Wrapper#call |
Server main loop | Request a method invocation. |
FiberReturnMessage(value, fiber_id) |
Wrapper#send_block_result |
Server main loop | Block result (fiber-suspend path). |
FiberExceptionMessage(exception, fiber_id) |
Wrapper#send_block_exception |
Server main loop | Block exception (fiber-suspend path). |
StopMessage() |
Wrapper#async_stop |
Server main loop | Request graceful shutdown. |
JoinMessage(reply_port) |
Wrapper#join (local mode only) |
Server main loop | Request notification when server has fully stopped. |
6.2 From server to caller
| Message | Sender | Receiver | Purpose |
|---|---|---|---|
ReturnMessage(value) |
Server#handle_method (or main-loop refusal) |
Wrapper#call |
Normal method result. |
ExceptionMessage(exception) |
Server (several sites) | Wrapper#call |
Method raised an exception; server-side refusal; or crash cleanup. |
FiberYieldMessage(args, kwargs, fiber_id) |
Server#fiber_yield_block |
Wrapper#call / handle_yield |
Request to run a block on the caller side (fiber-suspend path). |
BlockingYieldMessage(args, kwargs, reply_port) |
Server#blocking_yield_block |
Wrapper#call / handle_yield |
Same but blocking-fallback path. |
JoinReplyMessage() |
Server#send_join_reply |
Wrapper#join |
Terminal notification that the server has finished cleaning up. |
6.3 Within the server
| Message | Sender | Receiver | Purpose |
|---|---|---|---|
WorkerStoppedMessage(worker_num) |
Server#cleanup_worker |
Server main loop | A worker thread has terminated. Carried on the main @port. |
6.4 Port topology
Each CallMessage carries its own reply_port, which is a fresh
Ractor::Port created by Wrapper#call. This gives the caller a private
channel for all replies to that call: return value, exceptions, and both
variants of yield message. The reply port is closed when the call returns
(success or exception) via the ensure block.
The server has a single main @port that it receives on. It multiplexes
everything: new calls, stop/join requests, fiber resumes, and worker
death notifications. This is why every message that the server needs to
dispatch internally carries enough information to route itself (e.g.
FiberReturnMessage includes the fiber_id).
6.5 Observability using transaction
The transaction field on CallMessage is a 16-character base-36 random
string created by Wrapper#make_transaction. It exists to correlate log lines
across caller and server for a single call; it is not used for dispatch.
7. Blocks, re-entrancy, and fiber magic
This is the most subtle part of the library. The core problem: a caller
may pass a block (stub.each { |x| stub.process(x) }). Where does that
block's body run?
7.1 The two block environments
block_environment: :caller (default) — the block runs in the caller's
Ractor, with full access to the caller's lexical scope. The server has
to ask the caller to run each invocation, wait for the result, and then
resume the method.
block_environment: :wrapped — the block runs in the server Ractor, in
the wrapped object's context. No inter-Ractor communication is needed per
block call. The block is captured as a Ractor.shareable_proc, which
means it can only reference shareable state. Closures over caller-side
mutable variables will fail at shareability-check time.
The tradeoff is: :caller is the common case because most blocks do
close over state (accumulators, config, etc.), but paying a round-trip
per invocation of a block called in a tight loop (think each over a
large collection) can be very expensive. :wrapped is the escape hatch
when you really want the block body to live alongside the method and the
block is self-contained. The README's Enumerator-over-SQLite example is
one such case.
7.2 How the block arg is represented
Wrapper#make_block_arg looks at the block_environment setting and
constructs one of three things:
-
nil— no block was given. -
:send_block_message(a sentinel symbol) —:callermode. The server must construct a local proc that forwards each invocation back across the wire. - A
Ractor.shareable_proc—:wrappedmode. The shareable proc travels directly inside theCallMessageand is invoked in-Ractor.
On the server side, Server#make_block translates this into the actual
proc passed to the wrapped method:
-
nil→ no block is passed;__send__(name, *args, **kwargs, &nil)is equivalent to calling without a block. - A shareable proc → used directly.
-
:send_block_message→ a proc is constructed that, when invoked, performs the round-trip yield dance described in §7.3.
7.3 Caller-side block invocation — the fiber-suspend path
When the wrapped object invokes a :caller-environment block, the proc
created by make_block runs in the server. That proc needs to:
- Ship the arguments over to the caller.
- Wait for the caller to produce a result (or exception).
- Return that result (or raise that exception) to the wrapped method so execution continues.
Naively, step 2 is a blocking wait. But the server cannot just block, because other callers (or the same caller, via re-entrancy) might have messages waiting, and we do not want to deadlock. Moreover, the server's concurrency model is "handle one message at a time in the main loop," which cannot be respected if methods can arbitrarily block it.
The solution is to run method bodies inside Fibers. When a method needs to
yield to a caller-side block, its fiber calls Fiber.yield. Control returns to
the main loop, which processes further messages. When a matching
FiberReturnMessage / FiberExceptionMessage arrives, the main loop looks up
the suspended fiber by fiber_id and resumes it with the reply message. The
fiber picks up where it left off and continues the method.
Here is the full choreography, in sequence-diagram form, for one block
invocation (assume sequential mode, :caller block):
Caller Ractor Wrapper owner / Server Wrapped object
Wrapper#call
reply_port = Port.new
send(CallMessage) ────────▶ main_loop
dispatch_call
start_method_fiber ──▶ Fiber F
handle_method
object.m(&block) ─▶ method runs
yield arg
block.call(arg) ◀──┘
(our synthetic proc)
fiber_yield_block
send(FiberYieldMessage
fiber_id=F.id) ────▶ reply_port
loop: receive Fiber.yield ←─ suspends F
FiberYieldMessage ◀────── reply_port
handle_yield
run block in caller
result = ...
send(FiberReturnMessage) ──▶ server @port
main_loop
dispatch_fiber_resume
resume_method_fiber(msg)
F.resume(msg) ──▶ fiber_yield_block returns value
method continues, returns result
handle_method sends ReturnMessage ▶ reply_port
loop: receive
ReturnMessage ◀──────────── reply_port
return value
Critical invariants:
- Fibers cannot migrate between threads. A fiber can only be resumed from
the thread that last resumed it. In sequential mode this is trivially
satisfied, since the main loop is the only place that runs fibers. In
threaded mode, §8 explains how the
Dispatcherpreserves this invariant. - The main loop never blocks inside a method. It only blocks on
@port.receive. All method work is delegated to a fiber (sequential) or a worker thread (threaded). - Fiber ids are just
object_ids. They are unique while the fiber is alive, which is long enough for routing. Once a fiber completes it is removed from the@pending_fibers/ per-workerpendinghash, so stalefiber_ids cannot collide with fresh fibers.
7.4 Caller-side block invocation — the blocking-fallback path
The fiber-suspend path depends on one thing: the block-invoking proc being
called from the very same fiber that handle_method started in. That is true
for straightforward method bodies. It is not true in two cases:
- The wrapped method invokes the block from a nested fiber. The classic
example is an
Enumerator, whoseeachruns the user's block in a generator fiber, not the outer fiber. - The wrapped method invokes the block from a spawned thread.
Calling Fiber.yield from either context does something different from what we
need: it either yields the wrong fiber, or raises a FiberError. To stay
functional in these cases, Server#make_block captures the expected fiber at
construction time and checks at call time:
if Fiber.current.equal?(expected_fiber)
fiber_yield_block(...) # the fast path
else
blocking_yield_block(...) # the fallback
end
The blocking fallback:
- Creates a fresh temporary
reply_port. - Sends a
BlockingYieldMessagecarrying that port. - Calls
reply_port.receive— a real, thread-level block. - The caller's
handle_yieldsends the reply directly to the temporary port.
This path does block the invoking thread (or spawned thread / nested fiber). That is its defining limitation. Two consequences follow:
- In sequential mode with a nested-fiber block invocation, the server main loop is blocked. No other messages can be processed while the block runs in the caller. If the block tries to re-enter the wrapper, it will deadlock. The re-entering call goes to the server's port, but the server cannot handle it. This is the limitation the README's caveats warn about.
- In threaded mode, only one worker is blocked. Other workers continue to service other calls. But the blocked worker still cannot service anything else, so a long-running nested-fiber block still reduces effective concurrency by one.
The hybrid design (fast path where possible, fallback where necessary) is a deliberate trade-off: correctness in the common case, plus continued functionality in the exotic case, at the cost of a deadlock hazard that users have to be aware of when their block is re-entrant and invoked from a nested fiber or thread.
7.5 self-substitution for blocks
When a :caller block is invoked, it may receive the wrapped object
itself as an argument (think each_with_object(self) { ... }). Before
shipping the block arguments over the port, make_block's synthetic
proc replaces any argument that is equal?(@object) with @stub. This
keeps the caller from ever seeing the bare object and accidentally
performing direct operations on it from the wrong Ractor.
8. Worker thread dispatch — the Dispatcher
Threaded mode introduces the Dispatcher class, which solves two
problems at once: work distribution and fiber affinity.
8.1 Why not a single shared queue?
The obvious design would be: one thread-safe queue, all workers pull.
That works for new calls. It does not work for fiber resumes. If
worker A started fiber F, then F suspended, a FiberReturnMessage for F
arrives, and worker B dequeues it, then B cannot resume F, because Ruby
requires fibers to be resumed from their last resuming thread.
8.2 Queue layout
The Dispatcher holds:
- A shared queue (
@shared_queue) for newCallMessages. Any worker may dequeue. - Per-worker queues (
@worker_queues, indexed byworker_num) for fiber resumes. Only workerNdequeues from@worker_queues[N]. - A fiber→worker map (
@fiber_to_worker) so the main loop can route incomingFiberReturnMessage/FiberExceptionMessageto the correct per-worker queue. - Flags:
@closedand@crashed, driving graceful vs abortive shutdowns. - A single
@mutex+@condpair guarding all of the above.
Producers call @cond.broadcast rather than @cond.signal so a worker
waiting on its per-worker queue is not starved by shared-queue activity.
8.3 dequeue priority
Each worker thread calls @dispatcher.dequeue(worker_num, accept_calls:)
in a loop. Inside the mutex, dequeue returns the first of:
- An item from its own per-worker queue (a fiber resume). Always considered first, even after close — in-flight fibers must complete.
TERMINATEif@crashedis set and the per-worker queue is empty.- An item from the shared queue, but only if
accept_callsistrueand the dispatcher is not closed. - A one-shot
CLOSEDsentinel if@closedand this worker has not yet been told. This wakes the worker so it can transition into a "drain pending and exit" state. - Otherwise,
@cond.wait.
The accept_calls flag is the worker's way of saying "I've started
stopping; don't hand me new work." Once CLOSED has been delivered the
worker sets stopping = true and passes accept_calls: false on every
subsequent dequeue.
8.4 Fiber lifecycle in threaded mode
When a worker picks up a CallMessage:
start_worker_fibercreates a fiber whose body ishandle_method.- The fiber's
object_idis thefiber_id. The worker registers it with@dispatcher.register_fiber(fiber_id, worker_num)and stores it in a localpendinghash. - The worker resumes the fiber. If the fiber completes synchronously
(no block yield), the worker removes it from
pendingand calls@dispatcher.unregister_fiber. - If the fiber suspends (via
Fiber.yieldinfiber_yield_block), it remains inpendingand is left registered. The worker goes back to the dispatch loop. - Eventually a
FiberReturnMessage/FiberExceptionMessagearrives at the server's main port. The main loop calls@dispatcher.enqueue_fiber_resume, which looks up@fiber_to_worker[fiber_id]and pushes onto the right per-worker queue. - The owning worker dequeues it, resumes the fiber, and either completes it or suspends it again.
If a fiber-resume arrives for a fiber_id that is no longer registered,
enqueue_fiber_resume returns false and the main loop logs
"Discarding orphan fiber resume." This can happen if the worker that
owned the fiber crashed, since cleanup_worker unregisters all pending
fibers before the server observes WorkerStoppedMessage.
8.5 Why the main loop still routes fiber resumes
An alternative would be for workers to receive their fiber resumes
directly (e.g., each worker owning a port). The current design keeps
everything flowing through @port so there is exactly one place the
server receives messages. That simplifies:
- Shutdown: draining one port drains everything.
- Logging: one locus of message observation.
- Caller-side symmetry: callers only need to know one port.
The cost is an extra hop (main loop → dispatcher → worker), but this is cheap because the main loop does no work beyond enqueue.
9. Server lifecycle
Server#run is the top of the state machine:
def run
receive_remote_object if @isolated
start_workers if @threads_requested
main_loop
stop_workers if @threads_requested
cleanup
@object
rescue Exception => e
@crash_exception = e
@object
ensure
crash_cleanup if @crash_exception
end
Expressed as phases:
┌────────────────────┐
│ init (isolated) │ receive_remote_object
└─────────┬──────────┘
▼
┌────────────────────┐
│ start workers │ only if threads > 0
└─────────┬──────────┘
▼
┌────────────────────┐ accepts CallMessage, FiberReturn/Exception,
│ RUNNING │ StopMessage, JoinMessage, WorkerStoppedMessage.
│ (main_loop) │ Exits on: StopMessage, or unexpected worker death.
└─────────┬──────────┘
▼
┌────────────────────┐ sequential: drain_pending_fibers (inline);
│ STOPPING │ threaded: stop_workers (close dispatcher,
│ │ wait for WorkerStoppedMessage from each).
└─────────┬──────────┘ Refuses new calls with StoppedError.
▼
┌────────────────────┐ Close @port, drain remaining messages,
│ CLEANUP │ respond to outstanding join requests.
└─────────┬──────────┘
▼
(return @object — terminal value for the Ractor in isolated mode)
──────────────────────────── crash path ──────────────────────────────
Any uncaught exception from the above jumps to:
┌────────────────────┐ crash_cleanup:
│ CRASH CLEANUP │ • abort_pending_fibers (sequential)
│ (ensure block) │ • drain_dispatcher_after_crash (threaded)
└────────────────────┘ • drain_inbox_after_crash
• join_workers_after_crash (threaded)
• respond to join requests
9.1 Running phase — main_loop
Server#main_loop reads from @port.receive and dispatches on message type:
-
CallMessage→dispatch_call, which in sequential mode starts a new fiber inline (start_method_fiber), or in threaded mode pushes onto the dispatcher's shared queue (@dispatcher.enqueue_call). -
FiberReturnMessage/FiberExceptionMessage→dispatch_fiber_resume, which resumes the fiber inline (sequential) or enqueues onto the correct per-worker queue (threaded). -
JoinMessage→ added to@join_requests; reply is sent when the server finishes. -
StopMessage→ initiates graceful shutdown. In sequential mode, the main loop first callsdrain_pending_fibersto let any suspended methods complete. -
WorkerStoppedMessage→ an unexpected worker death during running phase. Treat it as a fatal signal and break out of the loop. The cleanup code down-stream takes over.
9.2 Stopping phase — sequential vs threaded
In sequential mode, the stopping logic is just drain_pending_fibers:
continue to receive messages on @port, refuse any new CallMessage,
forward FiberReturnMessage/FiberExceptionMessage to their fibers,
queue any late JoinMessages, and exit once @pending_fibers is empty.
In threaded mode, stop_workers is more involved:
- Call
@dispatcher.close, which flips@closedand drains (returns) anyCallMessages that were queued but never picked up. Those messages are refused withStoppedError. This is important: without this step, callers whose messages arrived after stop but before a worker could dequeue would hang forever. - Each worker, on its next
dequeue, receives the one-shotCLOSEDsentinel. It setsstopping = trueand stops accepting new calls. - The main loop continues to receive messages, but now it must:
- Refuse any
CallMessagethat still arrives. - Forward
FiberReturnMessage/FiberExceptionMessagethrough the dispatcher so that workers can finish their suspended methods. - Acknowledge
WorkerStoppedMessageand decrement@active_workers. - Queue late
JoinMessages.
- Refuse any
- Loop until all workers have reported stopped.
9.3 Cleanup phase
cleanup closes @port and then drains anything left in it. Callers
whose messages arrive after port close get Ractor::ClosedError on
send — nothing the server can do for them. But any CallMessage that
was already in the port before close is still refused, and any late
JoinMessage is answered immediately.
Finally, all queued join requests get a JoinReplyMessage.
9.4 Why main_loop breaks out on unexpected worker death
The WorkerStoppedMessage in the running-phase branch of main_loop covers
the case where a worker thread dies without going through the graceful stop
path, typically because it raised an exception we did not catch. When this
happens the server declares the situation unsafe: the invariant "every pending
fiber has a living worker to resume it" is broken, and rather than try to fix
it in place (by e.g. restarting the worker), the server shuts down. This is
opinionated: the author chose reliability of shutdown semantics over continued
availability of other workers.
9.5 Join
Wrapper#join has two implementations:
- Isolated mode:
@ractor.join— relies on the underlyingRactor#jointo block until the server Ractor terminates. - Local mode: there is no Ractor to join, so the wrapper sends a
JoinMessagewith a fresh reply port and waits forJoinReplyMessage. The server adds the reply port to its@join_requestslist and replies in cleanup or crash cleanup.
The docstring on Wrapper#join notes an important deviation from
Thread#join / Ractor#join: a crashed wrapper does not propagate
its exception out of join. The reasoning is that wrapper crashes are
typically internal bugs (in the server's dispatch code, not in the
wrapped object's methods), and we already deliver CrashedError to any
pending caller; re-raising in join would just produce a duplicate
error at an awkward point.
9.6 Recovering the object
In isolated mode, @ractor.value returns the wrapped object after the
Ractor has terminated. Server#run is written to always return @object
from both the success and rescue paths, so even a crashed server
surrenders the object (modulo Ruby's own post-mortem rules). This is
intentional: the object is yours, you may want to clean it up yourself,
and the wrapper should not hold it hostage.
In local mode, recover_object raises. The object never moved, so there is
nothing to recover.
10. Graceful stop and crash cleanup in detail
Graceful stop is covered in §9. This section focuses on what happens when something goes wrong.
10.1 What constitutes a crash
Any uncaught exception inside Server#run ends up in the rescue clause.
This can originate from:
- A bug in the server's dispatch code itself.
- An unexpected
Ractor::ClosedErrorwhen the port is already closed (though most sites catch this explicitly). - An exception during fiber management.
Notably, exceptions raised by the wrapped object's methods are not
crashes. They are caught by handle_method's rescue ::Exception clause and
converted into an ExceptionMessage sent to the caller. The server itself
stays alive.
A worker thread crash is detected via WorkerStoppedMessage (its normal stop
notification) arriving during the running phase, or via the worker's own
ensure block catching its exception (see worker_loop's crash_exception
path).
10.2 crash_cleanup
The goal of crash_cleanup is to deliver a CrashedError to every caller who
would otherwise hang, and to unblock any join waiters. It does as much as
possible, swallowing further errors, because by this point the server is
definitely going away and best-effort is the only realistic policy.
Steps:
- Threaded mode:
drain_dispatcher_after_crashcalls@dispatcher.crash_close. This (a) sets@crashed, so futuredequeuecalls returnTERMINATEon empty per-worker queues, causing workers to exit instead of waiting forever; and (b) returns the shared queue's undispatched messages so the server can sendCrashedErrorto each. - Sequential mode:
abort_pending_fiberscallsfiber.raise(error)on each suspended fiber. The exception emerges from the fiber'sFiber.yieldcall;handle_method'srescue ::Exceptioncatches it and sends anExceptionMessage(CrashedError)to the fiber's reply port. So the caller observesCrashedError— the same error class they would get in threaded mode. drain_inbox_after_crashcloses@portand drains anything left, sendingCrashedErrorto anyCallMessagesenders andJoinReplyMessageto anyJoinMessagesenders.- Threaded mode:
join_workers_after_crashwaits for all workers to finish. Workers' owncleanup_workerruns in their ensure blocks: they abort their pending fibers (deliveringCrashedErrorto each caller), unregister the fibers, and make a best effort to sendWorkerStoppedMessageback to the main loop. - Any remaining
@join_requestsare answered.
10.3 Why fiber.raise for sequential cleanup?
It might be simpler to iterate over @pending_fibers and send CrashedError
directly to each fiber's reply port. The reason fiber.raise is preferred is
that it runs the fiber's rescue and ensure blocks, allowing the method (and any
wrapper code around it) to clean up, e.g., releasing locks, closing file
handles opened inside the method. This better respects the wrapped object's
invariants at the cost of being slightly slower and more fallible.
10.4 The best-effort nature of cleanup
crash_cleanup wraps almost everything in rescue ::Exception. This is
deliberate: we are already handling a crash and the priority is to get
through the cleanup steps without aborting partway. A lost CrashedError
delivery is regrettable but tolerable; a stuck wrapper is not.
11. Design trade-offs and known limitations
This section collects the trade-offs already mentioned throughout, plus a few others, in one place for the benefit of readers deciding whether the library fits their use case.
11.1 Re-entrancy from nested fibers or spawned threads can deadlock
The fiber-suspend path is only available when the block invocation
happens on the same fiber that started the method. Nested fibers
(most visibly inside Enumerator) and spawned threads fall back to
the blocking path, which does not release the server to service
further messages. If such a block tries to re-enter the wrapper, the
re-entering call arrives at @port but nothing can pick it up — the
server is blocked inside the very call that sent it. This deadlocks.
Mitigations:
- Configure the method with
block_environment: :wrappedif the block is self-contained. - Avoid re-entering the wrapper from blocks called from within Enumerator generators or user-spawned threads.
- In threaded mode the blast radius is one worker, not the whole server. With enough workers this is survivable.
11.2 Blocks configured as :caller cannot outlive the method call
The synthetic proc generated by make_block relies on the caller still
being in its Wrapper#call reply loop. If the wrapped object saves the
block (as a callback, say) and invokes it later, the caller is long
gone and the fiber-yield / blocking-yield both have no one to reply.
The library does not currently detect this at save time — the failure
manifests at invocation time when the message goes nowhere.
If you need to register a callback, prefer block_environment: :wrapped
so the block travels as a Ractor.shareable_proc and is invoked
in-place.
11.3 Exceptions lose their backtrace
As of Ruby 4.0, exceptions transferred between Ractors are always copied (not moved) and the backtrace is cleared. This is a Ruby bug (tracked at bugs.ruby-lang.org issue 21818) that the library cannot work around, and it applies both to exceptions raised by the wrapped method and to exceptions raised by caller-side blocks.
11.4 Non-shareable, non-movable types cannot cross the boundary
Ractor's own rules apply. Threads, procs (non-shareable), backtraces,
and a few other types cannot be passed as arguments or returned as
values. :move can help with some cases (large strings, arrays of
mutable values), but some types cannot be moved at all.
11.5 Worker count is a hard ceiling on parallelism
The threads setting is the maximum number of concurrent method bodies.
The library does not grow or shrink the pool. Sizing it right requires
knowing both the workload's natural concurrency and, if you are using
blocking-fallback paths, the expected number of simultaneously-blocked
workers.
Suspended fibers (the common re-entrancy case) do not occupy a worker, so re-entrancy depth does not need to be part of the calculation.
11.6 No method-level timeouts
A misbehaving wrapped method that never returns will block its worker (or the server itself, in sequential mode) indefinitely. There is no built-in timeout. Callers can avoid their own indefinite wait by implementing timeouts around their stub calls, but the server-side work will still occupy the thread.
11.7 Experimental status
The library is self-described as experimental, and the README repeats this warning prominently. This is true both of the library and of Ractors in general in Ruby 4.0. Expect behavior to evolve as Ruby's Ractor implementation matures; internals may change in lock-step.
12. Putting it together — a worked walkthrough
To make all of the above concrete, here is the full story for a single
call to stub.find_by_id(42) against a SQLite3 wrapper configured with
use_current_ractor: true, threads: 2, from a caller in a different
Ractor. Assume :caller block environment and no block is passed in
this example.
- Caller Ractor invokes
stub.find_by_id(42).Stub#method_missingforwards toWrapper#call(:find_by_id, 42). Wrapper#callprepares aCallMessage. It creates a freshRactor::Portasreply_port, generates atransactionid, fetches the per-methodMethodSettings, computesblock_arg = nil(no block was given), and sends theCallMessageon@port.- Server's main loop (running in a Thread in the host Ractor)
receives the message. It calls
dispatch_call, which since threads were requested, calls@dispatcher.enqueue_call(message). The dispatcher pushes onto its shared queue and broadcasts. - Some worker (say worker 0) dequeues
[:call, message]. It callsstart_worker_fiber, which creates a fiber aroundhandle_method(message, worker_num: 0), registers the fiber with the dispatcher, and resumes it. - Inside the fiber,
handle_methodcalls@object.__send__(:find_by_id, 42). No block is involved, somake_blockreturnsnil. The DB object does its work and returns a row. handle_methodsends aReturnMessage(row)tomessage.reply_port. The fiber completes. The worker removes it from its localpendingand unregisters from the dispatcher. The worker loops back to@dispatcher.dequeue.- Caller's
Wrapper#callreceives the reply. Its loop matchesReturnMessage, returnsrow. Theensureblock closes thereply_port.Stub#method_missingreturnsrowto the caller.
Now insert a block: stub.find_by_id(42) { |r| transform(r) }.
- Between steps 5 and 6, the wrapped method invokes the block. Because
block_environment: :calleris the default,make_blockhas wrapped it in a proc that:- Substitutes the stub for any argument equal to
@object. - Checks that
Fiber.currentequals the fiber thathandle_methodstarted in. It does, so: - Sends a
FiberYieldMessage(args: [row], kwargs: {}, fiber_id: F)toreply_port. - Calls
Fiber.yield.
- Substitutes the stub for any argument equal to
- Worker 0 is now free and returns to the dispatch loop (the fiber is suspended but registered).
- The caller's
callloop receivesFiberYieldMessage, runshandle_yield, which calls the block, capturestransform(row), and sends aFiberReturnMessage(transformed, F)to the server's main@port. - The main loop receives the
FiberReturnMessage, callsdispatch_fiber_resume, which goes through the dispatcher:@dispatcher.enqueue_fiber_resume(message)looks up@fiber_to_worker[F](which is worker 0), pushes onto worker 0's per-worker queue, and broadcasts. - Worker 0's
dequeuefinds the per-worker queue non-empty, returns[:resume, message].resume_worker_fiberresumes fiber F with the message. Insidefiber_yield_block,Fiber.yieldreturns theFiberReturnMessage; the block-proc unwraps.valueand returnstransformedto the wrapped method. - The wrapped method completes, returns,
handle_methodsendsReturnMessageback toreply_port. The fiber ends, the worker cleans it up and goes back to the dispatch loop. - The caller's
callloop finally receivesReturnMessageand returns.
This is the full dance for one call with one block invocation. Every step can
be logged, and the transaction id ties them together in the output.
13. Summary
Ractor::Wrapper achieves shared access to a non-shareable object by:
- Running the object in a controlled server (either a dedicated Ractor or a set of threads in the creating Ractor).
- Exposing a frozen, shareable
Stubthat forwards calls via a message-passing protocol with per-call reply ports. - Using fibers to run method bodies so that they can suspend cleanly when caller-side blocks need to execute, without blocking the server's main message loop.
- Using a custom
Dispatcherin threaded mode that routes new calls through a shared queue but fiber resumes through per-worker queues, preserving Ruby's fiber-to-thread affinity. - Falling back to a blocking path when a block is invoked from a nested fiber or spawned thread, trading re-entrancy for continued functionality.
- Modeling configuration as a frozen value object that travels with each call, keeping the server stateless with respect to settings.
- Providing a carefully staged lifecycle (running → stopping → cleanup) with a separate crash-cleanup path that makes a best-effort attempt to unblock every pending caller and join waiter when something goes wrong.
The net effect is a library that tries to make using a non-shareable object from multiple Ractors look as close as possible to calling it directly, while being honest about the edges where the abstraction leaks.