Class: Octo::Tools::Terminal
- Defined in:
- lib/octo/tools/terminal.rb,
lib/octo/tools/terminal/output_cleaner.rb,
lib/octo/tools/terminal/session_manager.rb,
lib/octo/tools/terminal/persistent_session.rb
Overview
Unified terminal tool — the SINGLE entry point for running shell commands. Replaces the former ‘shell` + `safe_shell` tools.
AI-facing contract
Five call shapes, all on one tool:
1) Run a command, wait for it:
terminal(command: "ls -la")
→ { exit_code: 0, output: "..." }
2) Run a command that is expected to keep running (dev servers,
watchers, REPLs meant to stay open):
terminal(command: "rails s", background: true)
– collects ~2s of startup output, then:
– if it crashed in those 2s → { exit_code: N, output: "..." }
– if still alive → { session_id: 7, state: "background",
output: "Puma starting..." }
3) A previous call returned a session_id because the command
blocked on input (sudo password, REPL, etc.). Answer it:
terminal(session_id: 3, input: "mypass\n")
4) Poll a running session for new output without sending anything:
terminal(session_id: 7, input: "")
5) Kill a stuck / no-longer-wanted session:
terminal(session_id: 7, kill: true)
Response handshake
- Response has `exit_code` → command finished.
- Response has `session_id` → command is still running;
look at `state`: "waiting" means blocked on input,
"background" means intentionally long-running.
Safety
Every new ‘command` is routed through Octo::Tools::Security before being handed to the shell. This:
- Blocks sudo / pkill octo / eval / curl|bash / etc.
- Rewrites `curl ... | bash` into "download & review".
- Protects Gemfile / .env / .ssh / etc. from writes.
‘rm` is additionally intercepted at runtime by a shell function installed in each PTY session (see SAFE_RM_BASH): it moves files into the per-project trash at $OCTO_TRASH_DIR instead of deleting them. See trash_manager for list/restore. `input` is NOT subject to these rules (it is a reply to an already- running program, not a fresh command).
Defined Under Namespace
Modules: OutputCleaner Classes: PersistentSessionPool, SessionManager, SpawnFailed
Constant Summary collapse
- MAX_LLM_OUTPUT_CHARS =
Hard ceiling on the raw ‘output:` string we send back to the LLM. 4000 chars ≈ 1000 tokens — matches the value the legacy safe_shell tool used, which was empirically tuned to keep tool-call turns cheap. When real output exceeds this we SPILL the full cleaned text to a dedicated overflow file and only return the first portion — see OVERFLOW_PREVIEW_CHARS / spill_overflow_file below.
4_000- OVERFLOW_PREVIEW_CHARS =
When output overflows, the preview we keep in-context is slightly shorter than the hard ceiling so the “full output at: /tmp/…” notice + path still fits under MAX_LLM_OUTPUT_CHARS.
3_800- MAX_LINE_CHARS =
Per-line cap applied at write-time (inside the cleaning pipeline). Prevents a single minified JSON / CSS / JS blob from eating the entire 4 KB budget in one go. 500 chars is long enough to preserve real error messages (including stack frames) but short enough to survive dozens of lines inside 4 KB.
500- DEFAULT_TIMEOUT =
Max seconds we keep a single tool call blocked inside the shell. Raised from 15s → 60s so long-running installs/builds (bundle install, gem install, npm install, docker build, rails new, …) produce far fewer LLM round-trips: each poll replays the full context, so every avoided poll saves ~all the tokens of one turn.
60- DEFAULT_IDLE_MS =
How long output must be quiet before we assume the foreground command is waiting for user input and return control to the LLM. Raised from 500ms → 3000ms → 10_000ms: real shell prompts (sudo, REPL, [Y/n] confirmations) stay quiet forever, so 10s still feels instant for them; long builds / test runs frequently have multi- second gaps between phases (compilation ↔ linking, spec file transitions), and anything below 10s split those into multiple polls — each poll replays the whole LLM context, which is expensive.
10_000- BACKGROUND_COLLECT_SECONDS =
Background commands collect this many seconds of startup output so the agent can see crashes / readiness before getting the session_id.
2- BACKGROUND_TASK_MAX_DURATION =
Default ceiling for a fire-and-forget background task (fire_and_forget). Tasks running longer than this are treated as stuck and the watcher returns a timeout result. Callers can override via metadata. 2 hours covers large CI suites (full rspec, big docker build, slow ‘npm install` on a cold cache) but still bounds resource usage.
7_200- IDLE_MAX_DURATION =
2 min — abandoned pagers/REPLs
120- DISABLED_IDLE_MS =
Sentinel: when passed as idle_ms, disables idle early-return.
10_000_000- SLOW_COMMAND_PATTERNS =
Commands that we know take a long time and produce bursty output (quiet gaps between test files, compile phases, download batches, etc.). When the command line STARTS WITH or CONTAINS any of these tokens, we auto-extend the timeout to SLOW_COMMAND_TIMEOUT and disable idle-return entirely — otherwise the LLM ends up polling the same long-running job 5-10x, replaying full context each time. Taken verbatim from the legacy shell.rb list.
[ # Ruby "bundle install", "bundle update", "bundle exec rspec", "rspec", "rake test", "rails test", # Node ecosystem — covers npm / yarn / pnpm test/dev/build/install variants "npm install", "npm ci", "npm test", "npm run build", "npm run test", "npm run dev", "yarn install", "yarn build", "yarn test", "yarn dev", "pnpm install", "pnpm build", "pnpm test", "pnpm dev", # Python "pytest", "pip install", "pip3 install", "python -m pip install", "python -m pytest", "python setup.py", # Go / Rust "cargo build", "cargo test", "cargo install", "cargo bench", "go build", "go test", "go install", "go mod tidy", # JVM (Maven / Gradle) "mvn test", "mvn package", "mvn install", "gradle build", "gradle test", "gradle assemble", "gradle bootRun", # .NET / Elixir / PHP / Swift "dotnet build", "dotnet test", "dotnet restore", "mix test", "mix deps.get", "composer install", "composer update", "xcodebuild", "swift test", # C / C++ / Make-family "make", "make test", "make install", "make build", "make all", "cmake --build", "cmake -B", # Containers / Infra "docker build", "docker-compose build", "terraform plan", "terraform apply", "helm install", "helm upgrade", "kubectl apply", "ansible-playbook", "vagrant up" ].freeze
- SLOW_COMMAND_TIMEOUT =
Timeout granted to commands matched by SLOW_COMMAND_PATTERNS. 180s matches the legacy safe_shell “hard_timeout” for slow commands.
180- QUICK_COMMAND_PATTERNS =
Patterns that are obviously quick — using fire_and_forget on these is almost certainly a mistake and wastes tokens. The harness rejects such calls at runtime with a clear error so the LLM falls back to foreground mode.
[ /\A\s*ls\b/, /\A\s*cd\s/, /\A\s*pwd\b/, /\A\s*cat\s/, /\A\s*echo\b/, /\A\s*head\b/, /\A\s*tail\b/, /\A\s*wc\b/, /\A\s*which\b/, /\A\s*whoami\b/, /\A\s*date\b/, /\A\s*uname\b/, /\A\s*env\b/, /\A\s*clear\b/, /\A\s*history\b/, /\A\s*ps\b/, /\A\s*mkdir\b/, /\A\s*touch\b/, /\A\s*rm\b/, /\A\s*mv\b/, /\A\s*cp\b/ ].freeze
- SAFE_RM_PATH =
Absolute path to the safe-rm shell snippet shipped with the gem. Sourced by every interactive PTY session to install a ‘rm` shell function that moves files to $OCTO_TRASH_DIR instead of deleting them.
Why source-from-file instead of writing the function body into the PTY directly?
Writing a multi-line function definition into `zsh -l -i` is unreliable — ZLE (Zsh Line Editor) treats multi-line input as interactive editing and garbles the body. Loading from a file via a single `source` line avoids ZLE entirely.Why a shell function (instead of a Ruby-side text rewrite)?
A function defers parsing to the shell itself, so heredocs, multi-line commands, globs, and variable expansion are all handled correctly. The previous Ruby rewriter mis-parsed any command containing a heredoc body with "rm" in it.Coverage:
Intercepts — direct `rm …` in the interactive shell (incl. multi-line, heredoc, glob, env-var expansion). Bypassed by — `command rm`, `/bin/rm`, `xargs rm`, `find -exec rm`, child scripts. Same coverage as the old rewriter. File.("terminal/safe_rm.sh", __dir__).freeze
- OVERFLOW_DIR_NAME =
Overflow directory: shared across sessions (and persists after Octo exits) so the LLM can re-read the full output in later turns. Lives under /tmp so it is naturally swept by the OS, and we also best-effort prune files older than OVERFLOW_MAX_AGE_SEC on each write so long-running servers don’t accumulate garbage.
"octo-terminal-overflow"- OVERFLOW_MAX_AGE_SEC =
7 days
7 * 24 * 60 * 60
- DISPLAY_COMMAND_MAX_CHARS =
Max visible length of a command inside the tool-call summary line. Keeps the “terminal(…)” summary on a single UI row even when the underlying command spans multiple lines (heredocs, multi-line ruby -e blocks, etc.). The full command is still executed — only the display is shortened.
80- DISPLAY_TAIL_LINES =
Number of trailing lines of output to include in the human-readable display string (the result text that shows up in CLI / WebUI bubbles under each tool call). Keep small so multi-poll loops stay readable.
6
Instance Attribute Summary collapse
-
#agent_session_id ⇒ Object
readonly
agent_session_id is injected by the Agent that owns this tool instance.
Class Method Summary collapse
-
.command_safe_for_auto_execution?(command) ⇒ Boolean
Alias used by ToolExecutor to decide whether :confirm_safes mode should auto-execute without asking the user.
-
.run_sync(command, timeout: 120, cwd: nil, env: nil) ⇒ Array(String, Integer|nil)
——————————————————————— Internal Ruby API — synchronous capture ———————————————————————.
Instance Method Summary collapse
-
#cd_in_session(session, cwd) ⇒ Object
Called by the pool to move the live shell to ‘cwd`.
-
#execute(command: nil, handle_id: nil, input: nil, async: false, cwd: nil, env: nil, kill: nil, idle_ms: nil, working_dir: nil, max_duration: nil, **_ignored) ⇒ Object
——————————————————————— Public entrypoint — dispatches on parameter shape ———————————————————————.
- #format_call(args) ⇒ Object
- #format_result(result) ⇒ Object
- #format_result_for_ui(result) ⇒ Object
-
#initialize(agent_session_id: nil) ⇒ Terminal
constructor
A new instance of Terminal.
-
#reset_env_in_session(session, unset_keys:, set_env:) ⇒ Object
Called by the pool to reset env between calls.
-
#source_rc_in_session(session, rc_files) ⇒ Object
Called by the pool when rc files (e.g. ~/.zshrc) have changed since this session was spawned.
-
#spawn_persistent_session ⇒ Object
Public-ish: called by PersistentSessionPool to build a new long-lived shell.
Methods inherited from Base
#category, #description, #name, #parameters, #to_function_definition
Constructor Details
#initialize(agent_session_id: nil) ⇒ Terminal
Returns a new instance of Terminal.
107 108 109 110 |
# File 'lib/octo/tools/terminal.rb', line 107 def initialize(agent_session_id: nil) super() @agent_session_id = agent_session_id end |
Instance Attribute Details
#agent_session_id ⇒ Object (readonly)
agent_session_id is injected by the Agent that owns this tool instance. It is NOT exposed in tool_parameters — AI agents cannot set it.
105 106 107 |
# File 'lib/octo/tools/terminal.rb', line 105 def agent_session_id @agent_session_id end |
Class Method Details
.command_safe_for_auto_execution?(command) ⇒ Boolean
Alias used by ToolExecutor to decide whether :confirm_safes mode should auto-execute without asking the user.
332 333 334 |
# File 'lib/octo/tools/terminal.rb', line 332 def self.command_safe_for_auto_execution?(command) Octo::Tools::Security.command_safe_for_auto_execution?(command) end |
.run_sync(command, timeout: 120, cwd: nil, env: nil) ⇒ Array(String, Integer|nil)
Internal Ruby API — synchronous capture
Run a shell command and BLOCK until it terminates, returning [output, exit_code]. Drop-in replacement for Open3.capture2e that goes through the same PTY + login-shell + Security pipeline used by the AI-facing tool (so rbenv/mise shims and gem mirrors work).
Why this exists separately from #execute:
`execute` may return early with a :session_id the moment output
goes idle for DEFAULT_IDLE_MS (3s) — this is intentional for AI
agents (they can inspect progress, inject input, decide to kill).
Ruby callers like the HTTP server's upgrade flow only care about
"did it finish, with what output, what exit code" — they need
synchronous semantics. Previously each caller re-implemented the
poll loop (and 0.9.36's run_shell forgot to, causing the upgrade
failure bug).
NOT exposed in tool_parameters — AI agents cannot invoke this.
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 |
# File 'lib/octo/tools/terminal.rb', line 366 def self.run_sync(command, timeout: 120, cwd: nil, env: nil) terminal = new result = terminal.execute( command: command, timeout: timeout, cwd: cwd, env: env, ) output = result[:output].to_s # Hard deadline in wall-clock terms — a genuinely stuck command # must terminate. Each individual poll still carries `timeout`. deadline = Time.now + timeout.to_i + 60 while result[:exit_code].nil? && result[:handle_id] && Time.now < deadline result = terminal.execute( handle_id: result[:handle_id], input: "", timeout: timeout, ) output += result[:output].to_s end # Deadline exceeded — best-effort cleanup so the session doesn't leak. if result[:exit_code].nil? && result[:handle_id] begin terminal.execute(handle_id: result[:handle_id], kill: true) rescue StandardError # swallow — cleanup is best-effort end end [output, result[:exit_code]] end |
Instance Method Details
#cd_in_session(session, cwd) ⇒ Object
Called by the pool to move the live shell to ‘cwd`.
1487 1488 1489 |
# File 'lib/octo/tools/terminal.rb', line 1487 def cd_in_session(session, cwd) run_inline(session, "cd #{shell_escape_value(cwd)}") end |
#execute(command: nil, handle_id: nil, input: nil, async: false, cwd: nil, env: nil, kill: nil, idle_ms: nil, working_dir: nil, max_duration: nil, **_ignored) ⇒ Object
Public entrypoint — dispatches on parameter shape
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 |
# File 'lib/octo/tools/terminal.rb', line 275 def execute(command: nil, handle_id: nil, input: nil, async: false, cwd: nil, env: nil, kill: nil, idle_ms: nil, working_dir: nil, max_duration: nil, **_ignored) # Auto-tune: for well-known long-running commands (rspec, bundle # install, cargo build, etc.), we stretch the budget AND disable # idle-return. This collapses what would otherwise be 5-10 # "is it still running?" LLM round-trips into a single synchronous # call. Async runs and handle operations are NOT auto-tuned — # async already returns quickly by design. timeout = nil if command && !async && !handle_id && slow_command?(command) timeout ||= SLOW_COMMAND_TIMEOUT idle_ms ||= DISABLED_IDLE_MS end timeout = (timeout || DEFAULT_TIMEOUT).to_i idle_ms = (idle_ms || DEFAULT_IDLE_MS).to_i cwd ||= working_dir # Operations on an existing handle (query / send input / kill). if handle_id handle_id = handle_id.to_s if kill return do_kill_handle(handle_id) elsif input.nil? return do_query_handle(handle_id) else return do_continue_handle(handle_id, input.to_s, timeout: timeout, idle_ms: idle_ms) end end # Start a new command. if command && !command.to_s.strip.empty? # Runtime guard: reject async for obviously quick commands so the # LLM doesn't waste tokens on an "I started it" turn for `ls`. if async && quick_command?(command.to_s) return { error: "async:true is for long-running tasks (builds, tests, installs, dev servers). " \ "This command looks quick — drop async:true and use plain sync mode.", hint: "Commands like ls, cat, pwd, echo should not use async:true.", command: command.to_s } end return do_start(command.to_s, cwd: cwd, env: env, timeout: timeout, idle_ms: idle_ms, async: async ? true : false, max_duration: max_duration ? max_duration.to_i : nil) end { error: "terminal: must provide either `command`, or `handle_id` (alone to query, with input: to write, with kill:true to terminate)." } rescue SecurityError => e { error: "[Security] #{e.}", security_blocked: true } rescue StandardError => e { error: "terminal failed: #{e.class}: #{e.}", backtrace: e.backtrace.first(5) } end |
#format_call(args) ⇒ Object
1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 |
# File 'lib/octo/tools/terminal.rb', line 1698 def format_call(args) cmd = args[:command] || args["command"] handle = args[:handle_id] || args["handle_id"] inp = args[:input] || args["input"] kill = args[:kill] || args["kill"] async = args[:async] || args["async"] if handle && kill "terminal(cancel handle)" elsif handle && !inp.nil? if inp.to_s.empty? "terminal(check handle)" else preview = inp.to_s.strip preview = preview.length > 30 ? "#{preview[0, 30]}..." : preview "terminal(send #{preview.inspect})" end elsif handle "terminal(query handle)" elsif cmd display_cmd = compact_command_for_display(cmd) if async "terminal(#{display_cmd}, async)" else "terminal(#{display_cmd})" end else "terminal(?)" end end |
#format_result(result) ⇒ Object
1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 |
# File 'lib/octo/tools/terminal.rb', line 1745 def format_result(result) return "[Blocked] #{result[:error]}" if result.is_a?(Hash) && result[:security_blocked] return "error: #{result[:error]}" if result.is_a?(Hash) && result[:error] return "stopped" if result.is_a?(Hash) && result[:killed] return "done" unless result.is_a?(Hash) # Async task accepted — harness will notify on completion. if result[:accepted] return "async task started" end prefix = result[:security_rewrite] ? "[Safe] " : "" tail = display_tail(result[:output]) status = if result[:handle_id] # still running / waiting for input state = result[:state] || "waiting" "… #{state}" elsif result.key?(:exit_code) ec = result[:exit_code] ec.to_i.zero? ? "✓ exit=0" : "✗ exit=#{ec}" else "done" end status = "#{prefix}#{status}" unless prefix.empty? # When output overflowed, surface the file path in the UI too # (not just in the LLM-facing `output`). Keeps the dev aware that # the full log is recoverable. if result[:full_output_file] status = "#{status} [full: #{result[:full_output_file]}]" end tail.empty? ? status : "#{tail}\n#{status}" end |
#format_result_for_ui(result) ⇒ Object
1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 |
# File 'lib/octo/tools/terminal.rb', line 1784 def format_result_for_ui(result) return nil unless result.is_a?(Hash) return { type: "terminal", status: "error", error: result[:error] } if result[:error] return { type: "terminal", status: "killed" } if result[:killed] return { type: "terminal", status: "async", handle_id: result[:handle_id] } if result[:accepted] cmd = result[:original_command] || result[:rewritten_command] || "" ec = result[:exit_code] output = result[:output].to_s { type: "terminal", command: cmd, exit_code: ec, output_preview: output.slice(0, 800), output_truncated: result[:output_truncated] || false, full_output_file: result[:full_output_file], status: ec.nil? ? "running" : (ec.zero? ? "success" : "failed") } end |
#reset_env_in_session(session, unset_keys:, set_env:) ⇒ Object
Called by the pool to reset env between calls. First unsets any keys we exported last time, then exports the new ones.
1478 1479 1480 1481 1482 1483 1484 |
# File 'lib/octo/tools/terminal.rb', line 1478 def reset_env_in_session(session, unset_keys:, set_env:) parts = [] unset_keys.each { |k| parts << "unset #{shell_escape_var(k)}" } set_env.each { |k, v| parts << "export #{shell_escape_var(k)}=#{shell_escape_value(v)}" } return if parts.empty? run_inline(session, parts.join("; ")) end |
#source_rc_in_session(session, rc_files) ⇒ Object
Called by the pool when rc files (e.g. ~/.zshrc) have changed since this session was spawned. Sources them in shell-startup order so later files can see env set by earlier ones.
Notes:
- Errors inside each `source` are NOT silenced (dropping stderr
previously masked failures like a broken `mise activate` that
would leave PATH without node/ruby/etc.). They land in the PTY
log where a developer can inspect them if a command mysteriously
fails to find a tool.
- `|| true` keeps the compound line's exit code at 0 so our
marker reader treats the re-source as "succeeded" regardless
of per-file hiccups — we don't want a flaky rc to disable the
whole persistent shell.
1467 1468 1469 1470 1471 1472 1473 1474 |
# File 'lib/octo/tools/terminal.rb', line 1467 def source_rc_in_session(session, rc_files) return if rc_files.empty? cmd = rc_files.map { |f| escaped = f.gsub('"', '\"') "source \"#{escaped}\" || true" }.join("; ") run_inline(session, cmd, timeout: 15) end |
#spawn_persistent_session ⇒ Object
Public-ish: called by PersistentSessionPool to build a new long-lived shell. Uses the user’s SHELL with login+interactive flags so that all rc hooks (nvm, rbenv, brew shellenv, mise, conda, etc.) are loaded.
1134 1135 1136 1137 1138 1139 1140 1141 |
# File 'lib/octo/tools/terminal.rb', line 1134 def spawn_persistent_session shell, shell_name = user_shell args = persistent_shell_args(shell, shell_name) session = spawn_shell(args: args, shell_name: shell_name, command: "<persistent>", cwd: nil, env: {}) raise SpawnFailed, session[:error] if session.is_a?(Hash) session end |