Class: Pikuri::Workspace::Read
- Inherits:
-
Tool
- Object
- Tool
- Pikuri::Workspace::Read
- Defined in:
- lib/pikuri/workspace/read.rb
Overview
The read tool, expressed as a Tool subclass: instantiating Read.new(workspace: ws) produces a tool whose Tool#to_ruby_llm_tool wiring is identical to any bundled tool’s, so ruby_llm sees nothing special about it. Same shape as the agent tool from pikuri-subagents — workspace is captured by the execute closure at construction.
Output format
cat-n: each line is rendered as “%6dt%s” (six-column right- padded line number, tab, content). Chosen for breadth of training- data exposure: cat -n output shows up across virtually every Unix tutorial and Stack Overflow answer, so even small local models recognize the shape. opencode’s shorter “<n>: <content>” format saves a few thousand tokens per 2K-line file but trades model familiarity; pi omits line numbers entirely (cheapest tokens, but the model loses the ability to cite ranges or pick Edit boundaries precisely).
Truncation rules
The line/byte windowing is delegated to FileType.read_as_text_paged, which returns a Extractor::Page this tool renders; the same windower backs VectorDb::Tools::Read. Two independent limits, whichever fires first wins:
-
*Line limit* — DEFAULT_LIMIT lines (overridable via
limit). -
*Byte cap* — MAX_BYTES bytes of input content; not exposed as a parameter. Bypassable in practice by paging via
offset.
Additionally, individual lines longer than MAX_LINE_LENGTH chars are truncated with LINE_TRUNCATION_MARKER appended; the model is told to reach for grep to find content inside such files. (These constants alias the PAGE_* ones on Extractor — one source of truth, shared with VectorDb::Tools::Read.)
PDF (and other extracted formats)
Which formats read as text is the Extractor registry’s business, not this tool’s: with pikuri-pdf’s extractor registered, PDFs are claimed by their %PDF- magic prefix ahead of the binary refusal and extracted with one synthetic “— Page N —” header line per page (see Pikuri::Extractors::PDF); a gem plugging another extractor into the registry extends this tool for free. Extraction is lazy where the format allows (extract_lines): reading the first window of a 500-page PDF parses only the pages the window needs. Formats without a lazy line shape (HTML) are extracted in full and then windowed. Line numbers in PDF output are for citation back to the user only; PDFs are not editable through Edit.
PDFs with no extractable text (scanned images, empty documents) come back with an LLM-actionable hint string rather than an empty observation. Encrypted / malformed / XFA-form PDFs surface as “Error: …” — same convention as other tool errors the model can react to. No OCR.
Image attachments
PNG / JPEG / GIF / WebP files are detected by magic bytes (see image_mime) and routed to Read.format_image ahead of the binary sniff. Instead of a String observation, the tool returns a RubyLLM::Content carrying a short metadata note (“Read image: path (bytes, mime)”) plus the file as an attachment; the per-provider Media formatter turns that into the right image content block inside the tool_result. The text half is what the model cites back (“the image at lib/foo.png shows…”); the image half is what the model actually looks at.
offset / limit are ignored on the image path — there’s no line-paging concept. The hard size cap is MAX_IMAGE_BYTES; files above that come back as “Error: image too large…” rather than being silently encoded into a payload the upstream API would reject. No auto-resize: pikuri doesn’t pull in an image-processing dep for one tool’s ergonomics. See IDEAS.md for the deferred auto-resize discussion.
Vision capability of the underlying model is not checked here. Sending an image to a non-vision model produces a provider error the LLM can react to on the next turn; coupling Read to model capability metadata (notoriously incomplete for local servers) buys less than the friction it adds.
Refusals
-
Path outside the workspace → caught from Filesystem::Error, returned as “Error: …”.
-
File not found, EACCES → “Error: …”.
-
Path is a directory → “Error: … use the glob tool”, keeping directory listing as the glob tool’s responsibility (Step 9).
-
Image larger than MAX_IMAGE_BYTES → “Error: image too large…”, leaving the model to pick a different file or ask the user to resize.
-
Binary content → nothing in the Extractor registry claims it (Extractor::Passthrough declines on the FileType.binary? heuristic: any
NULbyte or a sample dense in control characters). Catches archives and compiled artifacts without an extension list to maintain. Registered extractors (pikuri-pdf’s PDF, pikuri-extractors’ office formats) claim their bytes ahead of that refusal; images are intercepted here via FileType.detect_mime before extraction is attempted — see above. -
Offset past EOF → “Error: offset N is beyond end of file (M lines total)”.
Constant Summary collapse
- DEFAULT_LIMIT =
Returns default value of the
limitparameter (number of lines to read per call). Pikuri::Extractor::PAGE_DEFAULT_LIMIT
- MAX_LINE_LENGTH =
Returns per-line character cap; longer lines are truncated with LINE_TRUNCATION_MARKER.
Pikuri::Extractor::PAGE_MAX_LINE_LENGTH
- LINE_TRUNCATION_MARKER =
Returns suffix appended to lines truncated by MAX_LINE_LENGTH.
Pikuri::Extractor::PAGE_LINE_TRUNCATION_MARKER
- MAX_BYTES =
Returns hard byte cap on input content collected per call. Counted on the line bytes (plus one for the joining newline); the rendered output is slightly larger due to the per-line “%6dt” prefix.
Pikuri::Extractor::PAGE_MAX_BYTES
- MAX_BYTES_LABEL =
Returns human-readable form of MAX_BYTES for the continuation marker.
"#{MAX_BYTES / 1024} KB"- MAX_IMAGE_BYTES =
Returns hard size cap on inline-attached images. Matches Anthropic’s per-image limit; same order of magnitude on OpenAI / Gemini. Above this we refuse rather than encode a payload the provider would reject.
5 * 1024 * 1024
- MAX_IMAGE_BYTES_LABEL =
Returns human-readable form of MAX_IMAGE_BYTES for refusal messages.
"#{MAX_IMAGE_BYTES / (1024 * 1024)} MB"- DESCRIPTION =
Description shown to the LLM. Follows the opencode-shape (summary +
Usage:bullets) prescribed by the project’s tool-description convention. Per-parameter constraints (defaults, format) live in the parameter descriptions, not here. <<~DESC Read a file from the workspace and return its contents with line numbers. Usage: - Output is line-numbered in `cat -n` style so subsequent edits can reference exact line numbers. - Use `offset` and `limit` to page through large files; when the response ends in `Use offset=N to continue`, call again with that offset. - Lines longer than #{MAX_LINE_LENGTH} chars are truncated with a marker — use `grep` for content inside such files. - PDFs are text-extracted page-by-page with `--- Page N ---` markers in the output. Cite pages back to the user from those markers. PDFs cannot be modified with `edit`. - PNG / JPEG / GIF / WebP files are attached as images you can see directly, alongside a short text note with the path and size. Requires a vision-capable model; on a text-only model the provider will reject the call. Images cannot be modified with `edit`. - Other binary files (archives, compiled artifacts) are refused; this tool reads text otherwise. - Directories are refused — use the `glob` tool to list files. - If unsure of the path, use `glob` first to look up filenames. - Avoid tiny repeated slices — if you need more context, read a larger window. DESC
Class Method Summary collapse
-
.read(workspace:, path:, offset:, limit:) ⇒ String, RubyLLM::Content
Resolve
pathagainstworkspace, refuse directories / binaries / missing files, and return either the cat-n-formatted slice or an “Error: …” observation.
Instance Method Summary collapse
- #initialize(workspace:) ⇒ Read constructor
-
#with_workspace(workspace) ⇒ Read
Produce a new Read bound to
workspace.
Constructor Details
#initialize(workspace:) ⇒ Read
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
# File 'lib/pikuri/workspace/read.rb', line 172 def initialize(workspace:) super( name: 'read', description: DESCRIPTION, parameters: Parameters.build { |p| p.required_string :path, 'Path to the file to read. Relative paths ' \ 'resolve against the workspace root, e.g. ' \ '"lib/foo.rb" or "/abs/path/to/file.txt".' p.optional_integer :offset, 'Line number to start reading from (1-indexed). ' \ "Defaults to 1, e.g. 200." p.optional_integer :limit, 'Maximum number of lines to read. Defaults to ' \ "#{DEFAULT_LIMIT}, e.g. 500." }, execute: ->(path:, offset: 1, limit: DEFAULT_LIMIT) { Read.read(workspace: workspace, path: path, offset: offset, limit: limit) } ) end |
Class Method Details
.read(workspace:, path:, offset:, limit:) ⇒ String, RubyLLM::Content
Resolve path against workspace, refuse directories / binaries / missing files, and return either the cat-n-formatted slice or an “Error: …” observation.
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
# File 'lib/pikuri/workspace/read.rb', line 218 def self.read(workspace:, path:, offset:, limit:) return "Error: offset must be >= 1, got #{offset}" if offset < 1 return "Error: limit must be >= 1, got #{limit}" if limit < 1 resolved = workspace.resolve_for_read(path) return "Error: file not found: #{path}" unless resolved.exist? return "Error: #{path} is a directory; use the glob tool to list files." if resolved.directory? mime = Pikuri::FileType.detect_mime(resolved) return format_image(path: path, resolved: resolved, mime: mime) if mime&.start_with?('image/') page = Pikuri::FileType.read_as_text_paged( resolved, offset: offset, limit: limit, max_bytes: MAX_BYTES, max_line_length: MAX_LINE_LENGTH ) render_page(page) rescue Filesystem::Error => e "Error: #{e.}" rescue Errno::EACCES => e "Error: cannot read #{path}: #{e.}" rescue ArgumentError # Nothing in the Extractor registry claimed the content — # read_as_text_paged's binary refusal (directories and images # were already handled above). "Error: cannot read binary file: #{path}" rescue RuntimeError => e # Extraction failure (malformed / unsupported PDF, ...) # surfaced by read_as_text_paged. "Error: #{e.}" end |
Instance Method Details
#with_workspace(workspace) ⇒ Read
Produce a new Pikuri::Workspace::Read bound to workspace. Used by SubAgent::SubAgentTool when a persona supplies a workspace_factory: — the parent’s instance is rebuilt for the sub-agent’s fresh workspace so paths resolve against the right root.
202 203 204 |
# File 'lib/pikuri/workspace/read.rb', line 202 def with_workspace(workspace) self.class.new(workspace: workspace) end |