Class: Pikuri::Code::Bash::Sandbox::Bubblewrap

Inherits:

Object

Object
Pikuri::Code::Bash::Sandbox::Bubblewrap

show all

Defined in:: lib/pikuri/code/bash/sandbox.rb

Overview

Bubblewrap (bwrap(1)) sandbox: composes a bwrap argv from the supplied Workspace plus a curated OS-runtime baseline, so the bash subprocess sees only the project + toolchain + ephemeral temp + the few /etc files needed for TLS, DNS, timezone, and name resolution.

What’s bound, and why

SYSTEM_ROOTS — /lib, /lib64, /bin, /sbin (often symlinks to /usr on modern distros). Not in Workspace#readable (the LLM has no business grepping /sbin/), but the subprocess needs them executable for the dynamic linker + standard utilities. /usr and /opt are not listed here because they already come in via Workspace#readable (added by Pikuri::Code::ToolchainPaths.readable).
ETC_BASELINE — /etc/ssl, /etc/ca-certificates, /etc/pki, /etc/resolv.conf, /etc/nsswitch.conf, /etc/localtime, /etc/hosts. Allowlist (not the whole /etc!) of the files bash subprocesses commonly need —TLS handshake, DNS, timezone, hostname resolution. Nothing sensitive (no shadow, no ssh_config, no NetworkManager state).
/tmp — when Workspace::Filesystem#temp is set, bound to the workspace temp dir (so the LLM’s reflexive /tmp writes land in a persistent dir that survives between bash calls). When no workspace temp is wired in, falls back to –tmpfs /tmp (per-call ephemeral). The host’s /tmp is never exposed. /proc (synthetic, sees only the sandbox’s own processes due to --unshare-pid) and /dev (synthetic, null/zero/random/tty only) round out the synthetic mounts.
workspace.readable → --ro-bind each path at the same path in the sandbox, EXCEPT paths that also appear in ephemeral_overlay: (see below).
workspace.writable → --bind (read+write) each path. The workspace temp’s host path (under ~/.cache/pikuri, not under /tmp) is bound at its host path too — so the same dir is reachable via both /tmp (LLM reflex) and the host path (advertised by the system prompt, used consistently by the file tools off the host filesystem).
ephemeral_overlay — per-user dependency caches the toolchain mutates (+~/.gradle/caches+, ~/.m2/repository, ~/.cargo/registry, …). Each path is mounted as a bubblewrap overlay: the host’s real dir is the lower (read-through), and a per-session upper + workdir under <workspace.internal_temp>/overlay-<slug>/ absorb writes. Result: gradle/maven/cargo see a fully read-write view of their cache, the host’s real cache is untouched, and on process exit the umbrella (and with it every upper layer) is removed by the workspace’s Finalizers registration. Within one pikuri-code session writes survive across bash calls (warm cache after the first build); across sessions they don’t (so a session that gets prompt-injected into poisoning the in-sandbox view of gradle’s cache cannot propagate the damage to the host’s normal gradle invocations or to a future pikuri-code session). Note: the overlay paths are deliberately narrow subdirs (e.g. ~/.gradle/caches, not ~/.gradle) so gradle.properties / init.d / .credentials never reach the sandbox at all — see ToolchainPaths for the credential / persistence exclusion rationale.

Concurrency contract

Each Bubblewrap instance must own its upper/workdir paths exclusively — overlayfs returns EBUSY when two live mounts share an upper or workdir. The bundled wiring guarantees this:

One Workspace::Filesystem mints one umbrella (Workspace::Filesystem#internal_temp).
One umbrella feeds one Bubblewrap, which derives its per-path overlay-<slug>/ subdirs from that umbrella.
Pikuri::Code::Bash runs bash -c synchronously (Subprocess#wait), and sub-agents block their parent’s loop while running (the agent tool from pikuri-subagents runs its child’s loop synchronously in its execute closure), so two bwrap invocations spawned by the same pikuri process never overlap in time.

Two concurrent pikuri-code processes are independent — each mints its own umbrella, each gets its own overlay-<slug>/ tree, the host’s real cache (the shared lower layer) is read-only and per kernel docs may be shared across overlay mounts without restriction. A downstream host that builds something fan-out-y (e.g. N parallel shell tasks reusing one Bubblewrap) would collide on its own; pikuri itself doesn’t.

What the overlay does NOT defend

Bubblewrap as a whole is *blast-radius containment* for the bash subprocess, not a malware-resistant boundary. Prompt injection that reaches the LLM can still:

Modify project source under project_root (the LLM legitimately needs Write access there — overlay isn’t an option without breaking the agent).
Inject a malicious dependency in the project’s build.gradle.kts/pom.xml/package.json, which the next build will execute.
Exfiltrate over the network — --share-net is intentional so git pull / mvn / gem install / curl work.

The overlay specifically prevents cross-project contamination via shared $HOME caches. Users who need adversarial isolation run pikuri-code inside a container / devcontainer; the container is the outer boundary, the bwrap sandbox is the inner one. See CLAUDE.md “Scope decisions” / “Workspace seam” and the matching note on Filesystem::AllowAll.

Isolation

–unshare-all –share-net: PID, mount, IPC, user, and UTS namespaces are unshared (the sandbox can’t see host processes, can’t mount on the host, can’t ptrace, …); the network namespace is kept shared because the agent’s bash routinely needs git pull, mvn, gem install, curl, etc. –die-with-parent –new-session: subprocess dies with pikuri, in its own session group (no terminal control bleed).

Failures that surface at construction

The constructor probes the workspace shape, then bwrap with a no-op invocation. Four cases raise loudly:

Workspace lists / as writable (typically Workspace::Filesystem::AllowAll) — Bubblewrap exists for filesystem containment, which is structurally meaningless when the whole filesystem is the workspace. The host should pass NONE instead.
Workspace has temp but alias_tmp_to_temp is off —inconsistent setup: this sandbox would bind workspace.temp at /tmp inside the subprocess (so the LLM’s reflexive /tmp writes persist), but file tools running on the host would still reject /tmp/foo as outside the workspace. The LLM would write via bash and then fail to read via the file tools; fail at construction instead of letting that trap fire mid-conversation.
bwrap not on PATH → Errno::ENOENT wrapped as RuntimeError.
Kernel lacks user-namespace support (some hardened distros) → bwrap exits non-zero, surfaced as RuntimeError.

Either way the binary should fail at boot, not on the first bash tool call — matches the “errors are loud” convention. The host opts out of sandboxing via --no-sandbox / --yolo.

Constant Summary collapse

BWRAP_BINARY =

'bwrap'

SYSTEM_ROOTS = System-root dirs the subprocess needs that aren’t in Workspace#readable. Each is --ro-bind‘d if it exists on the host; missing entries are skipped silently (older or unusual layouts).

%w[/lib /lib64 /bin /sbin].freeze

ETC_BASELINE = /etc file allowlist for the subprocess. Each is --ro-bind‘d if it exists on the host. Nothing else from /etc is exposed — no shadow, no passwd beyond what /etc/hosts touches, no SSH config, no NetworkManager state.

%w[
  /etc/ssl
  /etc/ca-certificates
  /etc/pki
  /etc/resolv.conf
  /etc/nsswitch.conf
  /etc/localtime
  /etc/hosts
].freeze

DENIED_CONTAINER_SOCKETS =

Container / VM control sockets that, if reachable from inside the sandbox, give the bash subprocess a one-step path to root-equivalent host access. The Docker daemon cheerfully honors docker run –privileged -v / /host, so exposing /var/run/docker.sock to a sandboxed agent effectively undoes the sandbox. Same story for containerd, CRI-O, podman (rootful), buildkit, libvirt, LXD.

The pikuri default workspace doesn’t expose /var or /run at all (none of SYSTEM_ROOTS, ETC_BASELINE, or ToolchainPaths.readable touches them), so these sockets are unreachable by default. #reject_container_socket_exposure! guards the configuration surface — a downstream binary adding the docker socket to workspace.writable “so the agent can run docker build” would unknowingly hand the LLM the keys, and we’d rather fail loud at construction.

Rootless variants under $XDG_RUNTIME_DIR / /run/user/$UID/ are computed at class-load time. The list is not exhaustive; it covers the engines most likely to be installed on a Linux dev box. A downstream host with an unusual setup can subclass and extend.

begin
  xdg_runtime = ENV['XDG_RUNTIME_DIR'] || "/run/user/#{Process.uid}"
  paths = %w[
    /var/run/docker.sock
    /run/docker.sock
    /var/run/containerd/containerd.sock
    /run/containerd/containerd.sock
    /var/run/crio/crio.sock
    /run/crio/crio.sock
    /run/podman/podman.sock
    /var/run/podman/podman.sock
    /run/buildkit/buildkitd.sock
    /var/run/buildkit/buildkitd.sock
    /var/run/libvirt/libvirt-sock
    /run/libvirt/libvirt-sock
    /var/lib/lxd/unix.socket
    /var/snap/lxd/common/lxd/unix.socket
  ]
  paths.concat([
    "#{xdg_runtime}/docker.sock",
    "#{xdg_runtime}/podman/podman.sock"
  ])
  paths.map { |p| Pathname.new(p) }.uniq.freeze
end

Instance Method Summary collapse

#initialize(workspace:, ephemeral_overlay: []) ⇒ Bubblewrap constructor

A new instance of Bubblewrap.
#wrap(argv) ⇒ Array<String>

bwrap + isolation flags + bind-mounts + argv, ready to hand to Subprocess.spawn.

Constructor Details

#initialize(workspace:, ephemeral_overlay: []) ⇒ `Bubblewrap`

Returns a new instance of Bubblewrap.

Parameters:

workspace (Pikuri::Workspace::Filesystem) —

the source of per-host readable/writable roots, the chdir target for the subprocess, and the parent of the per-session overlay state (Workspace::Filesystem#internal_temp).
ephemeral_overlay (Array<String, Pathname>) (defaults to: []) —

paths (must each be a member of workspace.readable) to mount as bubblewrap overlays instead of read-only binds. Typically wired with Pikuri::Code::ToolchainPaths.ephemeral_overlay. Empty by default — pure read-only baseline. See the class header for the rationale.

Raises:

(RuntimeError) —

if the workspace lists / as writable (Bubblewrap is for filesystem containment, which is moot when the entire filesystem is the workspace — typically Workspace::Filesystem::AllowAll; the host should pass NONE instead).
(RuntimeError) —

if the workspace has temp set but alias_tmp_to_temp unset — see the class header.
(RuntimeError) —

if any ephemeral_overlay path is not also a member of workspace.readable (so the LLM’s host-side file tools and the sandbox view stay consistent on which paths are visible).
(RuntimeError) —

if any workspace path equals or is an ancestor of a known container/VM control socket (/var/run/docker.sock, containerd.sock, podman.sock, …); see DENIED_CONTAINER_SOCKETS.
(RuntimeError) —

if bwrap isn’t on PATH or fails its probe (typically: kernel without user-namespace support).

# File 'lib/pikuri/code/bash/sandbox.rb', line 299

def initialize(workspace:, ephemeral_overlay: [])
  @workspace = workspace
  @ephemeral_overlay = ephemeral_overlay.map { |p| Pathname.new(p).realpath }.uniq
  reject_unbounded_workspace!
  reject_unaliased_temp!
  reject_overlay_outside_readable!
  reject_container_socket_exposure!
  check_bwrap!
end

Instance Method Details

#wrap(argv) ⇒ `Array<String>`

Returns bwrap + isolation flags + bind-mounts + argv, ready to hand to Subprocess.spawn.