Class: Evilution::ProcessSupervisor

Inherits:
Object
  • Object
show all
Defined in:
lib/evilution/process_supervisor.rb

Overview

Single owner of the process-lifecycle invariant: every pid spawned here is group-isolated, tracked in a signal-safe registry, group-signalled through a TERM/KILL ladder, and reaped – with its fds closed and sandbox dir removed.

EV-9f3b / EV-5rrh, Track A step 1. Generalizes the lock-free COW WorkerRegistry (EV-jwao) and absorbs ProcessCleanup.safe_kill/safe_wait semantics. Pure unit: no call sites are migrated here – Isolation::Fork (inner path) and WorkQueue::Worker (outer path) are routed through it in later steps (EV-3aw3, EV-dg69, EV-7a91).

Shape: instances own the lifecycle of the children they spawn, but every handle is also recorded in ONE process-global registry so the Runner signal trap can ‘.signal_all` across every fork-site through a single owner.

Signal-safety: under MRI a trap handler runs on the main thread between VM instructions, so it must not acquire a Mutex (the main thread may hold it -> deadlock). register/unregister swap @registry for a freshly built frozen array via a single atomic reference assignment (copy-on-write). The trap reads the current reference once and iterates that complete, immutable snapshot – no torn reads, no lock.

Defined Under Namespace

Classes: Handle

Constant Summary collapse

GRACE_PERIOD =
2

Class Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Class Attribute Details

.registryObject (readonly)

Frozen snapshot. Safe to read from a signal handler.



39
40
41
# File 'lib/evilution/process_supervisor.rb', line 39

def registry
  @registry
end

Class Method Details

.kill_and_reap_allObject

Trap-safe teardown of every registered child: SIGKILL each process group (sweeping grandchildren) and the bare leader pid, then reap the leaders so they cannot zombie, and clear the registry. Reads the COW snapshot once – no Mutex, safe from a signal handler.

EV-7a91: a process about to die on a fatal signal must not leave the children it OWNS behind. The Runner’s group-kill reaches only the worker groups; the inner per-mutation children left those groups (setpgid, EV-2sh8) and live in the worker’s own registry, so only the worker – their parent – can kill AND reap them before it dies. Without the reap they survive as zombies until some ancestor exits and init collects them, which never comes when evilution runs embedded in a long-lived host process.



80
81
82
83
84
85
86
87
88
89
90
# File 'lib/evilution/process_supervisor.rb', line 80

def kill_and_reap_all
  snapshot = @registry
  snapshot.each do |handle|
    kill_tolerant("KILL", -handle.pgid)
    kill_tolerant("KILL", handle.pid)
  end
  # Reap only after every group has been signalled, so a slow-to-die child
  # never delays killing the others' subtrees.
  snapshot.each { |handle| reap_tolerant(handle.pid) } # rubocop:disable Style/CombinableLoops
  @registry = (@registry - snapshot).freeze
end

.register(handle) ⇒ Object



41
42
43
# File 'lib/evilution/process_supervisor.rb', line 41

def register(handle)
  @registry = (@registry + [handle]).freeze
end

.reset_for_child!Object

Drop every inherited entry so a freshly forked child starts owning nothing. A child inherits a COW copy of this registry, but the handles in it belong to the PARENT (e.g. sibling workers); if the child later signalled or reaped them – via signal_all / kill_and_reap_all in its own signal handler – it would tear down processes it never spawned. The child re-registers only what it spawns itself.



64
65
66
# File 'lib/evilution/process_supervisor.rb', line 64

def reset_for_child!
  @registry = [].freeze
end

.signal_all(sig) ⇒ Object



49
50
51
52
53
54
55
56
# File 'lib/evilution/process_supervisor.rb', line 49

def signal_all(sig)
  @registry.each do |handle|
    Process.kill(sig, -handle.pgid)
  rescue Errno::ESRCH
    # Group already gone (leader + subtree reaped) -- nothing to signal.
    nil
  end
end

.unregister(handle) ⇒ Object



45
46
47
# File 'lib/evilution/process_supervisor.rb', line 45

def unregister(handle)
  @registry = @registry.reject { |existing| existing.pid == handle.pid }.freeze
end

Instance Method Details

#reap(handle) ⇒ Object

Reap the leader (ECHILD-tolerant if already reaped), then unconditionally release the resources the handle owns: close parent-side fds, remove the sandbox dir, and drop the handle from the registry.



160
161
162
163
164
# File 'lib/evilution/process_supervisor.rb', line 160

def reap(handle)
  safe_wait(handle.pid)
ensure
  release(handle)
end

#reap_nonblock(handle) ⇒ Object

Non-blocking reap for callers that poll a child’s liveness as part of a read protocol (e.g. Isolation::Fork’s marshal-pipe loop). Returns false while the child is still running – the handle stays registered so a signal trap can still reach it. Once the child has exited (or was already reaped), it releases the handle in the same step it reaps, so the process-global registry never holds a stale, already-reaped pgid.



172
173
174
175
176
177
# File 'lib/evilution/process_supervisor.rb', line 172

def reap_nonblock(handle)
  return false unless nonblocking_wait(handle.pid)

  release(handle)
  true
end

#signal_group(sig, handle) ⇒ Object

Signal the child’s whole process group (-pgid) to sweep any grandchildren, then the bare pid as a fallback for the case where setpgid failed (no group exists, so the group signal is a harmless Errno::ESRCH).



141
142
143
144
# File 'lib/evilution/process_supervisor.rb', line 141

def signal_group(sig, handle)
  safe_kill(sig, -handle.pgid)
  safe_kill(sig, handle.pid)
end

#spawn(sandbox_dir: nil, fds: [], isolate_in_child: true) ⇒ Object

Fork a child that becomes its own process-group leader and runs the block, returning a Handle. By default the child calls setpgid(0, 0) before yielding so any grandchildren it forks join its group and can be swept by a group signal; the parent repeats setpgid(pid, pid) to close the race where it signals before the child has isolated itself. The handle is registered BEFORE the parent-side setpgid so the trap can never observe a child that is already a group leader yet missing from the registry (EV-jwao race).

isolate_in_child: false suppresses the child-side setpgid for long-lived workers (the outer path): the child must NOT become its own group leader until the parent has registered it, otherwise a trap firing between fork and register would see a leader it cannot signal. With only the parent-side, post-register setpgid, the child stays in the parent group (reachable by the terminal signal directly) until the registry already lists it.



121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/evilution/process_supervisor.rb', line 121

def spawn(sandbox_dir: nil, fds: [], isolate_in_child: true)
  pid = ::Process.fork do
    self.class.reset_for_child!
    isolate_self if isolate_in_child
    yield
  end

  # Track the sandbox first thing after fork: if the parent takes a fatal
  # signal before isolate_child returns, Runner's trap (TempDirTracker
  # .cleanup_all) can still see and remove it, narrowing the leak window.
  Evilution::TempDirTracker.register(sandbox_dir) if sandbox_dir
  handle = Handle.new(pid: pid, pgid: pid, fds: fds, sandbox_dir: sandbox_dir)
  self.class.register(handle)
  isolate_child(pid)
  handle
end

#terminate(handle, grace: GRACE_PERIOD) ⇒ Object

Bounded TERM -> grace -> KILL ladder, then reap. Always ends with the child reaped and its resources released, whichever rung it dies on.



148
149
150
151
152
153
154
155
# File 'lib/evilution/process_supervisor.rb', line 148

def terminate(handle, grace: GRACE_PERIOD)
  signal_group("TERM", handle)
  unless exited?(handle.pid)
    sleep(grace)
    signal_group("KILL", handle) unless exited?(handle.pid)
  end
  reap(handle)
end