Architecture
tokio (Rust threads) Ruby
┌──────────────────────────┐
│ accept loop (hyper) │ bounded MPMC ┌─ worker: Ractor × threads ─┐
│ per request: │ ──── queue ───────► │ loop { │
│ parse → RequestCtx │ │ env = take_one │ ← blocks with the
│ queue full → 503 │ ◄─── response ───── │ status,h,b = app.(env) │ per-ractor lock
│ TLS (rustls) │ ◄─── body chunks ── │ respond / stream │ RELEASED
└──────────────────────────┘ └────────────────────────────┘
All network I/O lives in Rust on a tokio multi-threaded runtime; hyper
parses HTTP/1.1 and handles keep-alive; rustls terminates TLS. Ruby never
touches a socket. Each request becomes a Rust-side RequestCtx pushed to a
bounded flume MPMC queue; Ruby workers pull from it.
Topology
Puma-style two-level: workers × threads.
:ractormode—workersRactors, each runningthreadsRuby Threads over the same worker loop. Parallel across ractors (each has its own VM lock); concurrent within one only for I/O-bound handlers.:threadedmode—the same total capacity as plain Threads on the main ractor. Runs any Rack app; the GVL serializes CPU work.- Identical machinery either way: the flume queue is MPMC, a "worker slot"
is per-thread, and the worker loop (
lib/kino/worker.rb) is shared verbatim. - Experimental
lanes truereplaces the one shared queue with a small private queue per worker slot (awake-preferring dispatch, work stealing); see benchmarks.
The Rust ↔ Ruby boundary
- No native (TypedData) handle crosses a ractor boundary. Worker
ractors receive plain integers (server id, worker ids) plus the
Ractor-shareable app; native state lives in a global Rust-side registry
keyed by those ids. The per-request handle
(
Kino::Native::Request, a TypedData object) is created inside the worker ractor by the take calls (take_one/take_batch), so its ownership is correct by construction. - Blocking discipline: every blocking native call goes through
rb_thread_call_without_gvl(rb-sys; magnus doesn't wrap it) so a blocked worker holds no VM lock. Waits poll an atomic interrupt flag between boundedrecv_timeoutticks; the unblock function (UBF) just sets the flag.flume::Selectorlost wakeups under sustained load (workers went permanently deaf to a non-empty queue after ~100k requests) and is not used anywhere. - Fast path: when a request is already queued,
take_onetakes it withtry_recvwhile still holding the GVL—the release/reacquire pair (two scheduler round-trips) is skipped entirely. Under load this is the common case. - Fused crossing: the common complete-body response rides
respond_and_take_one: answer the previous request and take the next in one FFI call, ~one crossing per request once the loop is warm. The env Hash carries the request handle underenv["kino.request"], so no per-request pair array exists either. - Env construction: one FFI call builds the full CGI side of the Rack
env as a real Hash. Static keys, common methods/protocols and 44 common
HTTP_*header names come from a frozen (and therefore Ractor-shareable) string cache built once at init on the main ractor. Frozen keys also skip the dup thatHash#[]=performs on unfrozen string keys. Onlyrack.inputis lazy/streaming. - Response path: the Rack headers Hash is passed through as-is and
iterated on the Rust side (
RHash#foreach); header bytes are borrowed in place from rooted Ruby strings (safe: GVL held, hyper copies immediately). Single-chunk bodies skip the join copy.
Backpressure, in both directions
- Bounded request queue between tokio and Ruby. When it stays full past
queue_timeout, the client gets an immediate 503 rather than waiting. - Request bodies stream through a bounded(8) channel: hyper is only polled as fast as Ruby consumes (inbound backpressure costs nothing extra). Bodyless requests (most GETs) spawn no forwarder task at all.
- Response bodies stream through a bounded(8) channel the other way: a
slow client makes
write_chunkblock—with the GVL released.
Failure handling
Three parties can answer a client, coordinated by an atomic
first-claimant-wins flag on the per-request Responder:
- The app, via the worker loop (normal path;
StandardErroris rescued in Ruby and becomes a clean 500). - The supervisor: each worker ractor has a supervisor thread blocked in
Ractor#value. A hard crash (anyException) wakes it; it immediately 500s the crashed ractor's in-flight requests via aWeak<Responder>side table—not when GC eventually notices—and respawns the ractor with fresh slots. - A
Dropguard onRequestCtxas the universal backstop (GC of an abandoned handle, teardown races). The Drop path never touches the Ruby API, so it is safe from any thread.
With request_timeout configured, the tokio front-end can additionally
answer with a 504 on its own when the response head misses the deadline;
the worker keeps running, and its late response goes nowhere harmlessly:
the front-end has stopped listening (the oneshot receiver is dropped),
and the worker's claim makes the Drop backstop a no-op.
Client aborts are handled the same way in reverse: hyper drops the request
future, and a Rust Drop guard keeps the in-flight counter honest (a
plain decrement after an .await would never run).
Graceful shutdown
stop_accepting → drain until queue + in-flight reach zero or the
deadline passes → close_queue (idle workers see Disconnected and exit) →
join workers → past deadline: abort remaining clients (a 500, or a
connection abort mid-stream), interrupt blocked workers, reap
stragglers → tear down the tokio runtime. Idempotent;
a second INT/TERM force-exits.
Timer waits: Kino.sleep
MRI's sleep parks the thread on the VM timer, whose wakeups inside
non-main ractors are coarse (how coarse is environment-dependent; see
benchmarks).
Kino.sleep releases the GVL and waits on the OS clock directly, chunked
at the interrupt tick so Thread#kill and shutdown stay responsive.
Why tokio (researched June 2026)
- tokio + hyper: the bottleneck is the Ruby dispatch boundary, not raw I/O throughput; what matters is HTTP correctness, keep-alive, TLS, and h2-later—hyper's territory. Cross-platform out of the box.
- monoio: thread-per-core io_uring looks great in echo-server benchmarks, but hyper only works through its poll-io compat layer (forfeiting io_uring on the hot path), and the share-nothing advantage is spent the moment requests fan into an MPMC queue toward Ruby.
- compio: completion-based, cross-platform, production-proven—but no first-class HTTP server story yet, and completion-model owned-buffer semantics would leak into the request lifecycle design.
- ntex: the strongest alternative—unlike monoio/compio it has a
first-class HTTP/1.1 + HTTP/2 server stack (TechEmpower top tier) plus
an io_uring runtime ("neon") on Linux today. Rejected as the default
for now: its thread-per-core,
Rc-based!Sendworker model is exactly what our Send-ctx-into-MPMC dispatch opts out of; its own request/response/body types would force a conversion seam throughResponderand the streaming path; neon is Linux-only (ntex-on-tokio elsewhere forfeits the io_uring win and just trades hyper's battle-tested h1 for a less-deployed one); and the realistic gain is confined to syscall-bound /plaintext-class traffic—the Ruby boundary, not the front-end, is where Kino's time goes. Worth a contained feature-flag spike if the Linux plaintext ceiling ever matters competitively. - io_uring path: tokio ships in-tree io_uring as an unstable feature
(file ops as of 1.52; network expected to follow).
server.rsisolates the runtime, so adopting it later is a contained change—and would deliver most of ntex/neon's win without the type seam.
Versioning of risky dependencies
magnus is used for everything except the GVL-release primitives and the
rb_ext_ractor_safe flag, which go straight to rb-sys (magnus wraps
neither). magnus's lazy TypedData class cache is force-resolved at init
on the main ractor, so no worker ractor ever races its first resolution;
the only symbols the crate creates are made during server_start, also
on the main ractor.