Class: Pikuri::Workspace::Read

Inherits:
Tool
  • Object
show all
Defined in:
lib/pikuri/workspace/read.rb

Overview

The read tool, expressed as a Tool subclass: instantiating Read.new(workspace: ws) produces a tool whose Tool#to_ruby_llm_tool wiring is identical to any bundled tool’s, so ruby_llm sees nothing special about it. Same shape as the agent tool from pikuri-subagents — workspace is captured by the execute closure at construction.

Output format

cat-n: each line is rendered as “%6dt%s” (six-column right- padded line number, tab, content). Chosen for breadth of training- data exposure: cat -n output shows up across virtually every Unix tutorial and Stack Overflow answer, so even small local models recognize the shape. opencode’s shorter “<n>: <content>” format saves a few thousand tokens per 2K-line file but trades model familiarity; pi omits line numbers entirely (cheapest tokens, but the model loses the ability to cite ranges or pick Edit boundaries precisely).

Truncation rules

Two independent limits, whichever fires first wins:

  • *Line limit* — DEFAULT_LIMIT lines (overridable via limit).

  • *Byte cap* — MAX_BYTES bytes of input content; not exposed as a parameter. Bypassable in practice by paging via offset.

Additionally, individual lines longer than MAX_LINE_LENGTH chars are truncated with LINE_TRUNCATION_MARKER appended; the model is told to reach for grep to find content inside such files.

PDF extraction

PDFs are detected by their %PDF- magic prefix in the sample bytes and routed to Read.format_pdf instead of the binary-refusal path. The extractor walks pages lazily via pdf-reader, emitting one synthetic “— Page N —” header line per page followed by that page’s text. The offset / limit / MAX_BYTES contract is identical to the text path — extraction stops as soon as the line or byte cap is hit, so reading the first window of a 500-page PDF only parses the few pages needed. Line numbers in PDF output are for citation back to the user only; PDFs are not editable through Edit.

PDFs with no extractable text (scanned images, empty documents) come back with an LLM-actionable hint string rather than an empty observation. Encrypted / malformed / XFA-form PDFs surface as “Error: cannot extract PDF text: …” — same convention as other tool errors the model can react to. No OCR.

Image attachments

PNG / JPEG / GIF / WebP files are detected by magic bytes (see image_mime) and routed to Read.format_image ahead of the binary sniff. Instead of a String observation, the tool returns a RubyLLM::Content carrying a short metadata note (“Read image: path (bytes, mime)”) plus the file as an attachment; the per-provider Media formatter turns that into the right image content block inside the tool_result. The text half is what the model cites back (“the image at lib/foo.png shows…”); the image half is what the model actually looks at.

offset / limit are ignored on the image path — there’s no line-paging concept. The hard size cap is MAX_IMAGE_BYTES; files above that come back as “Error: image too large…” rather than being silently encoded into a payload the upstream API would reject. No auto-resize: pikuri doesn’t pull in an image-processing dep for one tool’s ergonomics. See IDEAS.md for the deferred auto-resize discussion.

Vision capability of the underlying model is not checked here. Sending an image to a non-vision model produces a provider error the LLM can react to on the next turn; coupling Read to model capability metadata (notoriously incomplete for local servers) buys less than the friction it adds.

Refusals

  • Path outside the workspace → caught from Filesystem::Error, returned as “Error: …”.

  • File not found, EACCES → “Error: …”.

  • Path is a directory → “Error: … use the glob tool”, keeping directory listing as the glob tool’s responsibility (Step 9).

  • Image larger than MAX_IMAGE_BYTES“Error: image too large…”, leaving the model to pick a different file or ask the user to resize.

  • Binary content → FileType.binary? on the sample; any NUL byte or a sample dense in control characters triggers refusal. Catches archives and compiled artifacts without an extension list to maintain. PDFs and supported images are intercepted by their respective magic-byte checks via FileType.detect_mime before the binary sniff — see above.

  • Offset past EOF → “Error: offset N is beyond end of file (M lines total)”.

Constant Summary collapse

DEFAULT_LIMIT =

Returns default value of the limit parameter (number of lines to read per call).

Returns:

  • (Integer)

    default value of the limit parameter (number of lines to read per call).

2000
MAX_LINE_LENGTH =

Returns per-line character cap; longer lines are truncated with LINE_TRUNCATION_MARKER.

Returns:

2000
LINE_TRUNCATION_MARKER =

Returns suffix appended to lines truncated by MAX_LINE_LENGTH.

Returns:

"... (line truncated to #{MAX_LINE_LENGTH} chars)"
MAX_BYTES =

Returns hard byte cap on input content collected per call. Counted on the line bytes (plus one for the joining newline); the rendered output is slightly larger due to the per-line “%6dt” prefix.

Returns:

  • (Integer)

    hard byte cap on input content collected per call. Counted on the line bytes (plus one for the joining newline); the rendered output is slightly larger due to the per-line “%6dt” prefix.

50 * 1024
MAX_BYTES_LABEL =

Returns human-readable form of MAX_BYTES for the continuation marker.

Returns:

  • (String)

    human-readable form of MAX_BYTES for the continuation marker.

"#{MAX_BYTES / 1024} KB"
MAX_IMAGE_BYTES =

Returns hard size cap on inline-attached images. Matches Anthropic’s per-image limit; same order of magnitude on OpenAI / Gemini. Above this we refuse rather than encode a payload the provider would reject.

Returns:

  • (Integer)

    hard size cap on inline-attached images. Matches Anthropic’s per-image limit; same order of magnitude on OpenAI / Gemini. Above this we refuse rather than encode a payload the provider would reject.

5 * 1024 * 1024
MAX_IMAGE_BYTES_LABEL =

Returns human-readable form of MAX_IMAGE_BYTES for refusal messages.

Returns:

  • (String)

    human-readable form of MAX_IMAGE_BYTES for refusal messages.

"#{MAX_IMAGE_BYTES / (1024 * 1024)} MB"
DESCRIPTION =

Description shown to the LLM. Follows the opencode-shape (summary + Usage: bullets) prescribed by the project’s tool-description convention. Per-parameter constraints (defaults, format) live in the parameter descriptions, not here.

Returns:

  • (String)
<<~DESC
  Read a file from the workspace and return its contents with line numbers.

  Usage:
  - Output is line-numbered in `cat -n` style so subsequent edits can reference exact line numbers.
  - Use `offset` and `limit` to page through large files; when the response ends in `Use offset=N to continue`, call again with that offset.
  - Lines longer than #{MAX_LINE_LENGTH} chars are truncated with a marker — use `grep` for content inside such files.
  - PDFs are text-extracted page-by-page with `--- Page N ---` markers in the output. Cite pages back to the user from those markers. PDFs cannot be modified with `edit`.
  - PNG / JPEG / GIF / WebP files are attached as images you can see directly, alongside a short text note with the path and size. Requires a vision-capable model; on a text-only model the provider will reject the call. Images cannot be modified with `edit`.
  - Other binary files (archives, compiled artifacts) are refused; this tool reads text otherwise.
  - Directories are refused — use the `glob` tool to list files.
  - If unsure of the path, use `glob` first to look up filenames.
  - Avoid tiny repeated slices — if you need more context, read a larger window.
DESC

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(workspace:) ⇒ Read

Parameters:

  • workspace (Filesystem)

    captured for path resolution; all reads route through workspace.resolve_for_read.



158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
# File 'lib/pikuri/workspace/read.rb', line 158

def initialize(workspace:)
  super(
    name: 'read',
    description: DESCRIPTION,
    parameters: Parameters.build { |p|
      p.required_string :path,
                        'Path to the file to read. Relative paths ' \
                        'resolve against the workspace root, e.g. ' \
                        '"lib/foo.rb" or "/abs/path/to/file.txt".'
      p.optional_integer :offset,
                         'Line number to start reading from (1-indexed). ' \
                         "Defaults to 1, e.g. 200."
      p.optional_integer :limit,
                         'Maximum number of lines to read. Defaults to ' \
                         "#{DEFAULT_LIMIT}, e.g. 500."
    },
    execute: ->(path:, offset: 1, limit: DEFAULT_LIMIT) {
      Read.read(workspace: workspace, path: path, offset: offset, limit: limit)
    }
  )
end

Class Method Details

.read(workspace:, path:, offset:, limit:) ⇒ String, RubyLLM::Content

Resolve path against workspace, refuse directories / binaries / missing files, and return either the cat-n-formatted slice or an “Error: …” observation.

Parameters:

  • workspace (Filesystem)
  • path (String)

    raw path as supplied by the LLM

  • offset (Integer)

    1-indexed line number to start at

  • limit (Integer)

    maximum lines to return

Returns:

  • (String, RubyLLM::Content)

    tool observation. Text and PDF paths return a String; supported images return a RubyLLM::Content with the file attached so the model receives the image inline. See “Image attachments” above.



204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# File 'lib/pikuri/workspace/read.rb', line 204

def self.read(workspace:, path:, offset:, limit:)
  return "Error: offset must be >= 1, got #{offset}" if offset < 1
  return "Error: limit must be >= 1, got #{limit}"   if limit < 1

  resolved = workspace.resolve_for_read(path)
  return "Error: file not found: #{path}" unless resolved.exist?
  return "Error: #{path} is a directory; use the glob tool to list files." if resolved.directory?

  mime = Pikuri::FileType.detect_mime(resolved)

  return format_pdf(path: path, resolved: resolved, offset: offset, limit: limit) if mime == 'application/pdf'
  return format_image(path: path, resolved: resolved, mime: mime) if mime&.start_with?('image/')
  return "Error: cannot read binary file: #{path}" if Pikuri::FileType.binary?(resolved)

  format_slice(path: path, resolved: resolved, offset: offset, limit: limit)
rescue Filesystem::Error => e
  "Error: #{e.message}"
rescue Errno::EACCES => e
  "Error: cannot read #{path}: #{e.message}"
end

Instance Method Details

#with_workspace(workspace) ⇒ Read

Produce a new Pikuri::Workspace::Read bound to workspace. Used by SubAgent::SubAgentTool when a persona supplies a workspace_factory: — the parent’s instance is rebuilt for the sub-agent’s fresh workspace so paths resolve against the right root.

Parameters:

Returns:



188
189
190
# File 'lib/pikuri/workspace/read.rb', line 188

def with_workspace(workspace)
  self.class.new(workspace: workspace)
end