tcp_user_timeout
Kernel-enforced socket deadlines on Linux via TCP_USER_TIMEOUT. Sockets opened inside a with_timeout block are forcibly closed by the kernel if transmitted data goes unacknowledged longer than the deadline. Pre-existing sockets — DB pools, persistent HTTP pools created at app boot — are never re-bound.
TcpUserTimeout.with_timeout(30) do
Net::HTTP.get(URI("https://upstream.example/slow"))
end
If the upstream stops ACKing, the kernel closes the connection at ~30s and the next read or write raises Errno::ETIMEDOUT / IO::TimeoutError. The thread is freed even if it was parked in a syscall — the case Thread#kill and Timeout.timeout cannot handle.
What this covers
- Write-side wedges. Client is sending, server stops reading, OS receive buffer fills, writes go unacked → kernel kills.
- Network partitions. Peer unreachable → unacked retransmits → kernel kills.
What this does NOT cover
- Read-side wedges where the peer is responsive at the kernel level. If the peer's userspace is stuck but its kernel auto-ACKs your packets,
TCP_USER_TIMEOUTdoes not fire. Use application-level timeouts for these (Net::HTTP#read_timeout,IO#timeout=, SDK request timeouts). - Pre-existing sockets. Hooks fire on socket creation. Connections in the DB pool or a persistent HTTP pool that were opened at boot are not re-bound. This is by design — production pools that should outlive any single request keep their original behavior.
- FFI / libcurl-based clients. curb, anything wrapping libcurl — bypasses Ruby's socket layer entirely.
- DNS.
getaddrinfois not covered. Mitigate viaresolv.conf. - Connect phase. Use
Net::HTTP#open_timeout, libpqconnect_timeout, etc.
Platform support
- Linux: enforced by the kernel.
- macOS / BSD / Windows: silent no-op. There is no direct equivalent of
TCP_USER_TIMEOUT;setsockoptraisesErrno::ENOPROTOOPTand the gem rescues it. Dev work and tests on macOS run unaffected; production should be Linux.
Installation
gem "tcp_user_timeout"
Usage
Direct
TcpUserTimeout.with_timeout(30) do
Net::HTTP.get(URI("..."))
end
Nests; restores outer scope on exception. Storage uses Fiber[] (Ruby 3.2+ inheritable fiber storage), so child fibers and threads spawned inside the block see the same deadline.
Rack middleware
Bound every newly-opened TCP socket during a request to slightly less than the web server's worker timeout (Puma's worker_timeout, Heroku's 30s router cap, etc.):
# config/application.rb
require "tcp_user_timeout/rack"
config.middleware.use TcpUserTimeout::Rack::Middleware, seconds: 25
Per-request override via callable:
config.middleware.use TcpUserTimeout::Rack::Middleware,
seconds: ->(env) { env["HTTP_X_REQUEST_TIMEOUT_S"]&.to_f || 25 }
Pre-existing pooled connections (the AR pool, persistent HTTP pools created at boot) are not touched — the middleware only affects sockets opened during the request.
ActiveJob
require "tcp_user_timeout/active_job"
class FetchUpstreamJob < ApplicationJob
include TcpUserTimeout::ActiveJob
self.max_execution_time = 30.seconds
def perform(url)
Net::HTTP.get(URI(url))
end
end
max_execution_time becomes a real upper bound on outbound TCP work. The actual TCP deadline is set slightly below — 5s headroom at production scales (≥10s), 90% of max below that — so the kernel fires before any outer guard.
Global default
Set a ceiling that applies when no with_timeout block is in effect:
TcpUserTimeout.global_default_seconds = 600 # 10 min safety net
Default is nil (no global ceiling).
Development
bundle install
bundle exec rake test # macOS-friendly; Linux-only tests skip
docker build -f Dockerfile.test -t tcp-user-timeout-test:linux .
docker volume create tcp-user-timeout-bundle
docker run --rm -v $PWD:/app -v tcp-user-timeout-bundle:/bundle \
tcp-user-timeout-test:linux bundle install
docker run --rm -v $PWD:/app -v tcp-user-timeout-bundle:/bundle \
tcp-user-timeout-test:linux bundle exec rake test
References
tcp(7)— TCP_USER_TIMEOUT- Cloudflare: When TCP sockets refuse to die
- gRPC proposal A18-tcp-user-timeout
License
MIT.