Class: Pikuri::Tool::Read
- Inherits:
-
Pikuri::Tool
- Object
- Pikuri::Tool
- Pikuri::Tool::Read
- Defined in:
- lib/pikuri/tool/read.rb
Overview
The read tool, expressed as a Pikuri::Tool subclass: instantiating Tool::Read.new(workspace: ws) produces a tool whose #to_ruby_llm_tool wiring is identical to any bundled tool’s, so ruby_llm sees nothing special about it. Same shape as SubAgent — workspace is captured by the execute closure at construction.
Output format
cat-n: each line is rendered as “%6dt%s” (six-column right- padded line number, tab, content). Chosen for breadth of training- data exposure: cat -n output shows up across virtually every Unix tutorial and Stack Overflow answer, so even small local models recognize the shape. opencode’s shorter “<n>: <content>” format saves a few thousand tokens per 2K-line file but trades model familiarity; pi omits line numbers entirely (cheapest tokens, but the model loses the ability to cite ranges or pick Edit boundaries precisely).
Truncation rules
Two independent limits, whichever fires first wins:
-
*Line limit* — DEFAULT_LIMIT lines (overridable via
limit). -
*Byte cap* — MAX_BYTES bytes of input content; not exposed as a parameter. Bypassable in practice by paging via
offset.
Additionally, individual lines longer than MAX_LINE_LENGTH chars are truncated with LINE_TRUNCATION_MARKER appended; the model is told to reach for grep to find content inside such files.
Refusals
-
Path outside the workspace → caught from Workspace::Error, returned as “Error: …”.
-
File not found, EACCES → “Error: …”.
-
Path is a directory → “Error: … use the glob tool”, keeping directory listing as the glob tool’s responsibility (Step 9).
-
Binary content → sniffed from the first BINARY_SAMPLE_BYTES of the file: any
NULbyte, or more than BINARY_NONPRINTABLE_THRESHOLD non-printable bytes (control chars outside t n v f r), triggers refusal. Catches images, PDFs, archives, and compiled artifacts without an extension list to maintain. -
Offset past EOF → “Error: offset N is beyond end of file (M lines total)”.
Constant Summary collapse
- DEFAULT_LIMIT =
Returns default value of the
limitparameter (number of lines to read per call). 2000- MAX_LINE_LENGTH =
Returns per-line character cap; longer lines are truncated with LINE_TRUNCATION_MARKER.
2000- LINE_TRUNCATION_MARKER =
Returns suffix appended to lines truncated by MAX_LINE_LENGTH.
"... (line truncated to #{MAX_LINE_LENGTH} chars)"- MAX_BYTES =
Returns hard byte cap on input content collected per call. Counted on the line bytes (plus one for the joining newline); the rendered output is slightly larger due to the per-line “%6dt” prefix.
50 * 1024
- MAX_BYTES_LABEL =
Returns human-readable form of MAX_BYTES for the continuation marker.
"#{MAX_BYTES / 1024} KB"- BINARY_SAMPLE_BYTES =
Returns number of bytes sampled from the start of the file for binary-content detection.
4096- BINARY_NONPRINTABLE_THRESHOLD =
Returns fraction of the sample that may be non-printable before the file is classified as binary. Matches opencode’s 30%.
0.30- DESCRIPTION =
Description shown to the LLM. Follows the opencode-shape (summary +
Usage:bullets) prescribed by the project’s tool-description convention. Per-parameter constraints (defaults, format) live in the parameter descriptions, not here. <<~DESC Read a file from the workspace and return its contents with line numbers. Usage: - Output is line-numbered in `cat -n` style so subsequent edits can reference exact line numbers. - Use `offset` and `limit` to page through large files; when the response ends in `Use offset=N to continue`, call again with that offset. - Lines longer than #{MAX_LINE_LENGTH} chars are truncated with a marker — use `grep` for content inside such files. - Binary files (images, PDFs, archives, compiled artifacts) are refused; this tool reads text only. - Directories are refused — use the `glob` tool to list files. - If unsure of the path, use `glob` first to look up filenames. - Avoid tiny repeated slices — if you need more context, read a larger window. DESC
Constants inherited from Pikuri::Tool
CALCULATOR, FETCH, WEB_SCRAPE, WEB_SEARCH
Instance Attribute Summary
Attributes inherited from Pikuri::Tool
#description, #execute, #name, #parameters
Class Method Summary collapse
-
.binary?(bytes) ⇒ Boolean
Heuristic binary classifier matching opencode’s: any NUL byte forces
true; otherwise count bytes outside the printable t n v f r + ASCII-32..126 range and ratio against the sample size. -
.read(workspace:, path:, offset:, limit:) ⇒ String
Resolve
pathagainstworkspace, refuse directories / binaries / missing files, and return either the cat-n-formatted slice or an “Error: …” observation.
Instance Method Summary collapse
- #initialize(workspace:) ⇒ Read constructor
Methods inherited from Pikuri::Tool
Constructor Details
#initialize(workspace:) ⇒ Read
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/pikuri/tool/read.rb', line 103 def initialize(workspace:) super( name: 'read', description: DESCRIPTION, parameters: Parameters.build { |p| p.required_string :path, 'Path to the file to read. Relative paths ' \ 'resolve against the workspace root, e.g. ' \ '"lib/foo.rb" or "/abs/path/to/file.txt".' p.optional_integer :offset, 'Line number to start reading from (1-indexed). ' \ "Defaults to 1, e.g. 200." p.optional_integer :limit, 'Maximum number of lines to read. Defaults to ' \ "#{DEFAULT_LIMIT}, e.g. 500." }, execute: ->(path:, offset: 1, limit: DEFAULT_LIMIT) { Read.read(workspace: workspace, path: path, offset: offset, limit: limit) } ) end |
Class Method Details
.binary?(bytes) ⇒ Boolean
Heuristic binary classifier matching opencode’s: any NUL byte forces true; otherwise count bytes outside the printable t n v f r + ASCII-32..126 range and ratio against the sample size. UTF-8 continuation bytes (0x80-0xBF) are >127 so they sit outside the non-printable ranges and pass through unflagged, letting UTF-8 text read fine.
Public because Edit reuses it to refuse binary targets —if Edit accepted a binary file the model has no way to have read, it could corrupt bytes the model never inspected. Same sniff, same threshold, one definition.
177 178 179 180 181 182 183 184 185 186 187 |
# File 'lib/pikuri/tool/read.rb', line 177 def self.binary?(bytes) return false if bytes.empty? non_printable = 0 bytes.each_byte do |b| return true if b.zero? non_printable += 1 if b < 9 || (b > 13 && b < 32) end non_printable.to_f / bytes.bytesize > BINARY_NONPRINTABLE_THRESHOLD end |
.read(workspace:, path:, offset:, limit:) ⇒ String
Resolve path against workspace, refuse directories / binaries / missing files, and return either the cat-n-formatted slice or an “Error: …” observation.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
# File 'lib/pikuri/tool/read.rb', line 134 def self.read(workspace:, path:, offset:, limit:) return "Error: offset must be >= 1, got #{offset}" if offset < 1 return "Error: limit must be >= 1, got #{limit}" if limit < 1 resolved = workspace.resolve_for_read(path) return "Error: file not found: #{path}" unless resolved.exist? return "Error: #{path} is a directory; use the glob tool to list files." if resolved.directory? sample = read_sample(resolved) return "Error: cannot read binary file: #{path}" if binary?(sample) format_slice(path: path, resolved: resolved, offset: offset, limit: limit) rescue Tool::Workspace::Error => e "Error: #{e.}" rescue Errno::EACCES => e "Error: cannot read #{path}: #{e.}" end |