Class: Pikuri::Code::GitClone

Inherits:
Tool
  • Object
show all
Defined in:
lib/pikuri/code/git_clone.rb

Overview

The git_clone tool — shallow-clone a public git repository into the workspace. Instantiating Code::GitClone.new(workspace: ws) produces a tool whose Tool#to_ruby_llm_tool wiring is identical to any bundled tool’s; execute closes over the workspace and a lazily-minted Bash::Sandbox::Bubblewrap.

Why this exists

The bundled researcher persona can web_search / web_scrape / fetch, which is great for “look up one fact” but inefficient when the task is “dig through opencode’s source for how it does X.” The pattern *N pages of HTML scraping* is much worse than *one shallow clone + grep*. This tool plus GIT_REPO_RESEARCHER (the persona that wires it together with workspace-scoped read/grep/glob) is the answer.

Threat model

Git clone is not “just reading files.” Hostile upstream has a history of RCEs:

  • CVE-2024-32002 — submodule + symlink + case-insensitive FS escape → RCE.

  • CVE-2022-39253 — --local clone reading arbitrary host files via symlinks.

  • CVE-2017-1000117 — ssh:// URL arg injection (+ssh://-oProxyCommand=…+) → arbitrary command execution.

  • .gitattributes filter drivers, .git/config core.fsmonitor /core.sshCommand — code paths that run during clone / checkout.

Mitigations baked in here:

  1. **HTTPS/HTTP only.** VALID_SCHEMES is %w[https http]; ssh://, git://, file://, ext::, and anything else are refused at the tool layer before git sees the string.

  2. **No submodule recursion.** --no-recurse-submodules kills the CVE-2024-32002 class.

  3. **Shallow clone.** –depth 1 skips history (fewer ref parsing edge cases, faster, smaller).

  4. **Bubblewrap-sandboxed subprocess.** The git binary runs inside Bash::Sandbox::Bubblewrap bound to the persona’s fresh temp workspace — no host ~/.ssh, no ~/.gitconfig, no other projects’ source, no container sockets. A clone-RCE blast radius is the persona’s throwaway workspace.

The Bubblewrap instance is minted lazily on first execute, not at construction — the boot-time GitClone wired by bin/pikuri-code never runs (it lives in the sub-agent-only pool), and gets replaced by a fresh-workspace clone via #with_workspace the moment a git_repo_researcher session starts. Eager construction would pay the ~bwrap probe cost on every coding-agent boot for no reason.

Output

On success: a one-line ack with the relative path inside the workspace. The persona then uses read / grep / glob to explore the clone.

On failure: “Error: …” in the usual pikuri convention. Possible causes: refused URL scheme, malformed URI, network failure, target dir already exists, git non-zero exit.

Constant Summary collapse

VALID_SCHEMES =

URL schemes accepted. https first (TLS) and http as a fallback for the rare public mirror. All other schemes are refused — see the threat-model header.

%w[https http].freeze
TIMEOUT_SECONDS =

Hard cap on the subprocess timeout (seconds). Real-world shallow clones of medium repos finish in seconds; this is the ceiling for a slow network or a large repo, after which we SIGTERM.

120
DESCRIPTION =

Returns:

  • (String)
<<~DESC
  Shallow-clone a public git repository into your workspace.

  Usage:
  - URL must be `https://` (preferred) or `http://`. Any other scheme (`ssh://`, `git://`, `file://`) is refused.
  - Always cloned with `--depth 1 --no-recurse-submodules`; you get the current tip, no history, no submodules.
  - Target directory name is derived from the URL's last segment (without `.git`). If that directory already exists, the call fails — pick a different URL or work with what you cloned.
  - On success returns the relative path to the cloned repo; use `read`, `grep`, `glob` to navigate it.
  - Clones run inside a sandbox bound to your workspace — host files, SSH keys, and `~/.gitconfig` are NOT visible to the cloned repo's hooks/filters.
DESC

Instance Method Summary collapse

Constructor Details

#initialize(workspace:, sandbox: nil) ⇒ GitClone

Parameters:

  • workspace (Pikuri::Workspace::Filesystem)

    captured for the clone target root and the sandbox bind set.

  • sandbox (Bash::Sandbox, nil) (defaults to: nil)

    optional sandbox override (defaults to a lazily-minted Bash::Sandbox::Bubblewrap bound to workspace). Pass Bash::Sandbox::NONE in tests that don’t have bwrap on PATH; production wiring leaves it nil so the Bubblewrap mint happens at the right moment (after #with_workspace replaces the workspace).



104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# File 'lib/pikuri/code/git_clone.rb', line 104

def initialize(workspace:, sandbox: nil)
  @workspace = workspace
  @sandbox = sandbox
  super(
    name: 'git_clone',
    description: DESCRIPTION,
    parameters: Pikuri::Tool::Parameters.build { |p|
      p.required_string :url,
                        'HTTPS (or HTTP) git URL to clone, e.g. ' \
                        '"https://github.com/anomalyco/opencode" or ' \
                        '"https://github.com/anomalyco/opencode.git". ' \
                        'Other schemes are refused.'
    },
    execute: ->(url:) { execute_clone(url: url) }
  )
end

Instance Method Details

#with_workspace(workspace) ⇒ GitClone

Produce a new Pikuri::Code::GitClone bound to workspace. The sandbox is NOT carried over — the new instance lazily mints a fresh Bubblewrap from the new workspace, since a sandbox’s bind set depends on the workspace it constrains. See class header.

Parameters:

  • workspace (Pikuri::Workspace::Filesystem)

Returns:



128
129
130
# File 'lib/pikuri/code/git_clone.rb', line 128

def with_workspace(workspace)
  self.class.new(workspace: workspace)
end