Class: Pikuri::Tool::Read

Inherits:
Pikuri::Tool show all
Defined in:
lib/pikuri/tool/read.rb

Overview

The read tool, expressed as a Pikuri::Tool subclass: instantiating Tool::Read.new(workspace: ws) produces a tool whose #to_ruby_llm_tool wiring is identical to any bundled tool’s, so ruby_llm sees nothing special about it. Same shape as SubAgent — workspace is captured by the execute closure at construction.

Output format

cat-n: each line is rendered as “%6dt%s” (six-column right- padded line number, tab, content). Chosen for breadth of training- data exposure: cat -n output shows up across virtually every Unix tutorial and Stack Overflow answer, so even small local models recognize the shape. opencode’s shorter “<n>: <content>” format saves a few thousand tokens per 2K-line file but trades model familiarity; pi omits line numbers entirely (cheapest tokens, but the model loses the ability to cite ranges or pick Edit boundaries precisely).

Truncation rules

Two independent limits, whichever fires first wins:

  • *Line limit* — DEFAULT_LIMIT lines (overridable via limit).

  • *Byte cap* — MAX_BYTES bytes of input content; not exposed as a parameter. Bypassable in practice by paging via offset.

Additionally, individual lines longer than MAX_LINE_LENGTH chars are truncated with LINE_TRUNCATION_MARKER appended; the model is told to reach for grep to find content inside such files.

Refusals

  • Path outside the workspace → caught from Workspace::Error, returned as “Error: …”.

  • File not found, EACCES → “Error: …”.

  • Path is a directory → “Error: … use the glob tool”, keeping directory listing as the glob tool’s responsibility (Step 9).

  • Binary content → sniffed from the first BINARY_SAMPLE_BYTES of the file: any NUL byte, or more than BINARY_NONPRINTABLE_THRESHOLD non-printable bytes (control chars outside t n v f r), triggers refusal. Catches images, PDFs, archives, and compiled artifacts without an extension list to maintain.

  • Offset past EOF → “Error: offset N is beyond end of file (M lines total)”.

Constant Summary collapse

DEFAULT_LIMIT =

Returns default value of the limit parameter (number of lines to read per call).

Returns:

  • (Integer)

    default value of the limit parameter (number of lines to read per call).

2000
MAX_LINE_LENGTH =

Returns per-line character cap; longer lines are truncated with LINE_TRUNCATION_MARKER.

Returns:

2000
LINE_TRUNCATION_MARKER =

Returns suffix appended to lines truncated by MAX_LINE_LENGTH.

Returns:

"... (line truncated to #{MAX_LINE_LENGTH} chars)"
MAX_BYTES =

Returns hard byte cap on input content collected per call. Counted on the line bytes (plus one for the joining newline); the rendered output is slightly larger due to the per-line “%6dt” prefix.

Returns:

  • (Integer)

    hard byte cap on input content collected per call. Counted on the line bytes (plus one for the joining newline); the rendered output is slightly larger due to the per-line “%6dt” prefix.

50 * 1024
MAX_BYTES_LABEL =

Returns human-readable form of MAX_BYTES for the continuation marker.

Returns:

  • (String)

    human-readable form of MAX_BYTES for the continuation marker.

"#{MAX_BYTES / 1024} KB"
BINARY_SAMPLE_BYTES =

Returns number of bytes sampled from the start of the file for binary-content detection.

Returns:

  • (Integer)

    number of bytes sampled from the start of the file for binary-content detection.

4096
BINARY_NONPRINTABLE_THRESHOLD =

Returns fraction of the sample that may be non-printable before the file is classified as binary. Matches opencode’s 30%.

Returns:

  • (Float)

    fraction of the sample that may be non-printable before the file is classified as binary. Matches opencode’s 30%.

0.30
DESCRIPTION =

Description shown to the LLM. Follows the opencode-shape (summary + Usage: bullets) prescribed by the project’s tool-description convention. Per-parameter constraints (defaults, format) live in the parameter descriptions, not here.

Returns:

  • (String)
<<~DESC
  Read a file from the workspace and return its contents with line numbers.

  Usage:
  - Output is line-numbered in `cat -n` style so subsequent edits can reference exact line numbers.
  - Use `offset` and `limit` to page through large files; when the response ends in `Use offset=N to continue`, call again with that offset.
  - Lines longer than #{MAX_LINE_LENGTH} chars are truncated with a marker — use `grep` for content inside such files.
  - Binary files (images, PDFs, archives, compiled artifacts) are refused; this tool reads text only.
  - Directories are refused — use the `glob` tool to list files.
  - If unsure of the path, use `glob` first to look up filenames.
  - Avoid tiny repeated slices — if you need more context, read a larger window.
DESC

Constants inherited from Pikuri::Tool

CALCULATOR, FETCH, WEB_SCRAPE, WEB_SEARCH

Instance Attribute Summary

Attributes inherited from Pikuri::Tool

#description, #execute, #name, #parameters

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Pikuri::Tool

#run, #to_ruby_llm_tool

Constructor Details

#initialize(workspace:) ⇒ Read

Parameters:

  • workspace (Tool::Workspace)

    captured for path resolution; all reads route through workspace.resolve_for_read.



103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/pikuri/tool/read.rb', line 103

def initialize(workspace:)
  super(
    name: 'read',
    description: DESCRIPTION,
    parameters: Parameters.build { |p|
      p.required_string :path,
                        'Path to the file to read. Relative paths ' \
                        'resolve against the workspace root, e.g. ' \
                        '"lib/foo.rb" or "/abs/path/to/file.txt".'
      p.optional_integer :offset,
                         'Line number to start reading from (1-indexed). ' \
                         "Defaults to 1, e.g. 200."
      p.optional_integer :limit,
                         'Maximum number of lines to read. Defaults to ' \
                         "#{DEFAULT_LIMIT}, e.g. 500."
    },
    execute: ->(path:, offset: 1, limit: DEFAULT_LIMIT) {
      Read.read(workspace: workspace, path: path, offset: offset, limit: limit)
    }
  )
end

Class Method Details

.binary?(bytes) ⇒ Boolean

Heuristic binary classifier matching opencode’s: any NUL byte forces true; otherwise count bytes outside the printable t n v f r + ASCII-32..126 range and ratio against the sample size. UTF-8 continuation bytes (0x80-0xBF) are >127 so they sit outside the non-printable ranges and pass through unflagged, letting UTF-8 text read fine.

Public because Edit reuses it to refuse binary targets —if Edit accepted a binary file the model has no way to have read, it could corrupt bytes the model never inspected. Same sniff, same threshold, one definition.

Parameters:

  • bytes (String)

    sample bytes

Returns:

  • (Boolean)


177
178
179
180
181
182
183
184
185
186
187
# File 'lib/pikuri/tool/read.rb', line 177

def self.binary?(bytes)
  return false if bytes.empty?

  non_printable = 0
  bytes.each_byte do |b|
    return true if b.zero?

    non_printable += 1 if b < 9 || (b > 13 && b < 32)
  end
  non_printable.to_f / bytes.bytesize > BINARY_NONPRINTABLE_THRESHOLD
end

.read(workspace:, path:, offset:, limit:) ⇒ String

Resolve path against workspace, refuse directories / binaries / missing files, and return either the cat-n-formatted slice or an “Error: …” observation.

Parameters:

  • workspace (Tool::Workspace)
  • path (String)

    raw path as supplied by the LLM

  • offset (Integer)

    1-indexed line number to start at

  • limit (Integer)

    maximum lines to return

Returns:

  • (String)

    tool observation



134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# File 'lib/pikuri/tool/read.rb', line 134

def self.read(workspace:, path:, offset:, limit:)
  return "Error: offset must be >= 1, got #{offset}" if offset < 1
  return "Error: limit must be >= 1, got #{limit}"   if limit < 1

  resolved = workspace.resolve_for_read(path)
  return "Error: file not found: #{path}" unless resolved.exist?
  return "Error: #{path} is a directory; use the glob tool to list files." if resolved.directory?

  sample = read_sample(resolved)
  return "Error: cannot read binary file: #{path}" if binary?(sample)

  format_slice(path: path, resolved: resolved, offset: offset, limit: limit)
rescue Tool::Workspace::Error => e
  "Error: #{e.message}"
rescue Errno::EACCES => e
  "Error: cannot read #{path}: #{e.message}"
end