Class: Pikuri::Tool::Grep

Inherits:
Pikuri::Tool show all
Defined in:
lib/pikuri/tool/grep.rb

Overview

The grep tool — content search across the workspace via ripgrep. Instantiating Tool::Grep.new(workspace: ws) produces a tool whose #to_ruby_llm_tool wiring is identical to any bundled tool’s. Same shape as Read (workspace captured by the execute closure, no confirmer — search is read-only).

ripgrep dependency

Hard dependency: Grep.check_binaries! runs in initialize and raises if rg isn’t on PATH. Mirrors Bash‘s posture for bash/timeout. We don’t ship a Ruby fallback — replicating rg’s Rust-regex dialect, glob handling, and .gitignore parsing is a research-loop dead end. Failure message includes the install hint.

Argv

rg --line-number --color=never --no-heading --with-filename \
   --hidden --max-columns=2000 --max-columns-preview \
   --sort=path \
   [-i] [--glob <g>] [--files-with-matches|--count-matches] \
   -- <pattern> <relative-path-or-dot>
  • --no-heading + --with-filename → flat path:line:content rows regardless of whether the search target is a directory or a single file (rg defaults to suppressing the filename for single-file searches — we force it on for output consistency).

  • --hidden → search dotfiles (still respects .gitignore).

  • –max-columns=2000 –max-columns-preview → rg truncates lines longer than MAX_LINE_LENGTH bytes server-side and appends a preview marker, sparing us per-line truncation.

  • –sort=path → deterministic output (single-threaded; fine for typical repos under ~10k files). Makes specs assertable and gives the model a stable order to scan.

  • Subprocess runs with chdir: workspace.cwd and is always given an explicit path argument. Subprocess.spawn uses popen2e which gives the child a piped (non-tty) stdin; rg’s default heuristic on no-path-arg-with-piped-stdin is to search stdin (which we then close — yielding zero matches). Passing the path argument explicitly bypasses the heuristic. Output paths come back as ./... when the path is .; the leading ./ is stripped post-rg so the model sees clean workspace-relative paths.

Output modes

  • content (default) — path:line:content rows.

  • files_with_matches — just file paths, one per line.

  • countpath:count per file.

Use files_with_matches to scope a broad search cheaply before paying tokens for content.

Truncation

Total output is head-truncated to MAX_BYTES (head-only — grep tails usually carry less signal than the first matches; opposite bias from Bash). Cut at the last line boundary, with a marker reporting omitted bytes and the original total so the model knows how much it missed.

Exit codes

  • 0 → matches; format with footer.

  • 1 → no matches; return “No matches for pattern ‘…’”.

  • 2 → rg error (bad regex, missing path); return “Error: ripgrep: …”.

Refusals

All returned as “Error: …” observations:

  • Empty pattern → fast reject.

  • Unknown output_mode → enum error listing valid values.

  • Path outside the workspace → caught from Workspace::Error.

  • Nonexistent path → “Error: path not found: <path>”.

Constant Summary collapse

MAX_BYTES =

Returns hard byte cap on combined rg output. Same value as Read::MAX_BYTES so the two file-touching tools share a budget shape.

Returns:

  • (Integer)

    hard byte cap on combined rg output. Same value as Read::MAX_BYTES so the two file-touching tools share a budget shape.

50 * 1024
MAX_BYTES_LABEL =

Returns human-readable form of MAX_BYTES for the truncation marker.

Returns:

  • (String)

    human-readable form of MAX_BYTES for the truncation marker.

"#{MAX_BYTES / 1024} KB"
MAX_LINE_LENGTH =

Returns per-line cap passed to rg’s --max-columns. Long lines are truncated server-side with a preview marker.

Returns:

  • (Integer)

    per-line cap passed to rg’s --max-columns. Long lines are truncated server-side with a preview marker.

2000
OUTPUT_MODES =

Returns valid output_mode values.

Returns:

  • (Array<String>)

    valid output_mode values.

%w[content files_with_matches count].freeze
DEFAULT_OUTPUT_MODE =

Returns default output_mode.

Returns:

  • (String)

    default output_mode.

'content'
DESCRIPTION =

Description shown to the LLM. opencode-shape (summary + Usage: bullets). Per-parameter constraints live in parameter descriptions.

Returns:

  • (String)
<<~DESC
  Search file contents for a regex pattern across the workspace.

  Usage:
  - Wraps `ripgrep` — regex syntax is rg's Rust-regex dialect (mostly PCRE-compatible; no lookbehind).
  - Default search root is the workspace root; pass `path` to narrow to a file or subdirectory.
  - Respects `.gitignore` — for unfiltered search use bash `rg --no-ignore <pattern>`.
  - Use `glob` to filter by filename, e.g. `"*.rb"` or `"src/**/*.{ts,tsx}"`.
  - `output_mode` controls verbosity: `content` (default, file:line:text), `files_with_matches` (paths only), `count` (matches per file).
  - Use `files_with_matches` first to scope a broad search, then `content` (or `read`) to investigate — saves tokens.
  - Output is truncated to #{MAX_BYTES_LABEL}; refine the pattern or narrow `path` if the response ends in a truncation marker.
  - Long lines are truncated to #{MAX_LINE_LENGTH} chars with a preview marker; use `read` to see full lines.
DESC

Constants inherited from Pikuri::Tool

CALCULATOR, FETCH, WEB_SCRAPE, WEB_SEARCH

Instance Attribute Summary

Attributes inherited from Pikuri::Tool

#description, #execute, #name, #parameters

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Pikuri::Tool

#run, #to_ruby_llm_tool

Constructor Details

#initialize(workspace:) ⇒ Grep

Parameters:

  • workspace (Tool::Workspace)

    captured for path resolution and as chdir for rg. All path arguments route through workspace.resolve_for_read.

Raises:

  • (RuntimeError)

    if rg isn’t on PATH; fail-loud at construction rather than the first tool call.



124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# File 'lib/pikuri/tool/grep.rb', line 124

def initialize(workspace:)
  Grep.send(:check_binaries!)
  super(
    name: 'grep',
    description: DESCRIPTION,
    parameters: Parameters.build { |p|
      p.required_string :pattern,
                        'Regex pattern to search for (rg Rust-regex ' \
                        'dialect), e.g. "def\s+\w+" or "TODO".'
      p.optional_string :path,
                        'File or directory to search. Relative paths ' \
                        'resolve against the workspace root. Defaults ' \
                        'to the workspace root, e.g. "lib/" or "lib/foo.rb".'
      p.optional_string :glob,
                        'Filename glob to filter files, e.g. "*.rb" ' \
                        'or "src/**/*.{ts,tsx}".'
      p.optional_boolean :case_insensitive,
                         'Match case-insensitively. Defaults to false, e.g. true.'
      p.optional_string :output_mode,
                        "One of #{OUTPUT_MODES.join(', ')}. Defaults to " \
                        "#{DEFAULT_OUTPUT_MODE}, e.g. \"files_with_matches\"."
    },
    execute: lambda { |pattern:, path: nil, glob: nil,
                      case_insensitive: false, output_mode: DEFAULT_OUTPUT_MODE|
      Grep.search(workspace: workspace, pattern: pattern, path: path,
                  glob: glob, case_insensitive: case_insensitive,
                  output_mode: output_mode)
    }
  )
end

Class Method Details

.search(workspace:, pattern:, path:, glob:, case_insensitive:, output_mode:) ⇒ String

Validate inputs, resolve the path against the workspace, spawn rg, and render the observation. Returns either the formatted results, a “no matches” string, or “Error: …”.

Parameters:

  • workspace (Tool::Workspace)
  • pattern (String)
  • path (String, nil)
  • glob (String, nil)
  • case_insensitive (Boolean)
  • output_mode (String)

Returns:

  • (String)


166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# File 'lib/pikuri/tool/grep.rb', line 166

def self.search(workspace:, pattern:, path:, glob:, case_insensitive:, output_mode:)
  return 'Error: empty pattern.' if pattern.empty?
  unless OUTPUT_MODES.include?(output_mode)
    return "Error: output_mode must be one of #{OUTPUT_MODES.join(', ')}, " \
           "got #{output_mode.inspect}."
  end

  search_target = '.'
  if path
    resolved = workspace.resolve_for_read(path)
    return "Error: path not found: #{path}" unless resolved.exist?

    rel = resolved.relative_path_from(workspace.cwd).to_s
    search_target = rel
  end

  argv = build_argv(pattern: pattern, glob: glob,
                    case_insensitive: case_insensitive,
                    output_mode: output_mode, path: search_target)

  result = Pikuri::Subprocess.spawn(*argv, chdir: workspace.cwd.to_s).wait
  exit_code = result.status.exitstatus

  case exit_code
  when 0
    format_output(result.output, output_mode: output_mode,
                  pattern: pattern, path: path)
  when 1
    no_match_message(pattern: pattern, path: path)
  else
    stderr = result.output.strip
    stderr = "exited #{exit_code}" if stderr.empty?
    "Error: ripgrep: #{stderr}"
  end
rescue Tool::Workspace::Error => e
  "Error: #{e.message}"
end