Class: Pikuri::Tool::Read

Inherits:

Pikuri::Tool

Object
Pikuri::Tool
Pikuri::Tool::Read

show all

Defined in:: lib/pikuri/tool/read.rb

Overview

The read tool, expressed as a Pikuri::Tool subclass: instantiating Tool::Read.new(workspace: ws) produces a tool whose #to_ruby_llm_tool wiring is identical to any bundled tool’s, so ruby_llm sees nothing special about it. Same shape as SubAgent — workspace is captured by the execute closure at construction.

Output format

cat-n: each line is rendered as “%6dt%s” (six-column right- padded line number, tab, content). Chosen for breadth of training- data exposure: cat -n output shows up across virtually every Unix tutorial and Stack Overflow answer, so even small local models recognize the shape. opencode’s shorter “<n>: <content>” format saves a few thousand tokens per 2K-line file but trades model familiarity; pi omits line numbers entirely (cheapest tokens, but the model loses the ability to cite ranges or pick Edit boundaries precisely).

Truncation rules

Two independent limits, whichever fires first wins:

*Line limit* — DEFAULT_LIMIT lines (overridable via limit).
*Byte cap* — MAX_BYTES bytes of input content; not exposed as a parameter. Bypassable in practice by paging via offset.

Additionally, individual lines longer than MAX_LINE_LENGTH chars are truncated with LINE_TRUNCATION_MARKER appended; the model is told to reach for grep to find content inside such files.

Refusals

Path outside the workspace → caught from Workspace::Error, returned as “Error: …”.
File not found, EACCES → “Error: …”.
Path is a directory → “Error: … use the glob tool”, keeping directory listing as the glob tool’s responsibility (Step 9).
Binary content → sniffed from the first BINARY_SAMPLE_BYTES of the file: any NUL byte, or more than BINARY_NONPRINTABLE_THRESHOLD non-printable bytes (control chars outside t n v f r), triggers refusal. Catches images, PDFs, archives, and compiled artifacts without an extension list to maintain.
Offset past EOF → “Error: offset N is beyond end of file (M lines total)”.

Constant Summary collapse

DEFAULT_LIMIT = Returns default value of the limit parameter (number of lines to read per call). Returns: (Integer) — default value of the limit parameter (number of lines to read per call).

MAX_LINE_LENGTH = Returns per-line character cap; longer lines are truncated with LINE_TRUNCATION_MARKER. Returns: (Integer) — per-line character cap; longer lines are truncated with LINE_TRUNCATION_MARKER.

LINE_TRUNCATION_MARKER = Returns suffix appended to lines truncated by MAX_LINE_LENGTH. Returns: (String) — suffix appended to lines truncated by MAX_LINE_LENGTH.

"... (line truncated to #{MAX_LINE_LENGTH} chars)"

MAX_BYTES = Returns hard byte cap on input content collected per call. Counted on the line bytes (plus one for the joining newline); the rendered output is slightly larger due to the per-line “%6dt” prefix. Returns: (Integer) — hard byte cap on input content collected per call. Counted on the line bytes (plus one for the joining newline); the rendered output is slightly larger due to the per-line “%6dt” prefix.

50 * 1024

MAX_BYTES_LABEL = Returns human-readable form of MAX_BYTES for the continuation marker. Returns: (String) — human-readable form of MAX_BYTES for the continuation marker.

"#{MAX_BYTES / 1024} KB"

BINARY_SAMPLE_BYTES = Returns number of bytes sampled from the start of the file for binary-content detection. Returns: (Integer) — number of bytes sampled from the start of the file for binary-content detection.

BINARY_NONPRINTABLE_THRESHOLD = Returns fraction of the sample that may be non-printable before the file is classified as binary. Matches opencode’s 30%. Returns: (Float) — fraction of the sample that may be non-printable before the file is classified as binary. Matches opencode’s 30%.

0.30

DESCRIPTION = Description shown to the LLM. Follows the opencode-shape (summary + Usage: bullets) prescribed by the project’s tool-description convention. Per-parameter constraints (defaults, format) live in the parameter descriptions, not here. Returns: (String)

<<~DESC
  Read a file from the workspace and return its contents with line numbers.

  Usage:
  - Output is line-numbered in `cat -n` style so subsequent edits can reference exact line numbers.
  - Use `offset` and `limit` to page through large files; when the response ends in `Use offset=N to continue`, call again with that offset.
  - Lines longer than #{MAX_LINE_LENGTH} chars are truncated with a marker — use `grep` for content inside such files.
  - Binary files (images, PDFs, archives, compiled artifacts) are refused; this tool reads text only.
  - Directories are refused — use the `glob` tool to list files.
  - If unsure of the path, use `glob` first to look up filenames.
  - Avoid tiny repeated slices — if you need more context, read a larger window.
DESC

Constants inherited from Pikuri::Tool

CALCULATOR, FETCH, WEB_SCRAPE, WEB_SEARCH

Instance Attribute Summary

Attributes inherited from Pikuri::Tool

#description, #execute, #name, #parameters

Class Method Summary collapse

.binary?(bytes) ⇒ Boolean

Heuristic binary classifier matching opencode’s: any NUL byte forces true; otherwise count bytes outside the printable t n v f r + ASCII-32..126 range and ratio against the sample size.
.read(workspace:, path:, offset:, limit:) ⇒ String

Resolve path against workspace, refuse directories / binaries / missing files, and return either the cat-n-formatted slice or an “Error: …” observation.

Instance Method Summary collapse

#initialize(workspace:) ⇒ Read constructor

Methods inherited from Pikuri::Tool

#run, #to_ruby_llm_tool

Constructor Details

#initialize(workspace:) ⇒ `Read`

Parameters:

workspace (Tool::Workspace) —

captured for path resolution; all reads route through workspace.resolve_for_read.

# File 'lib/pikuri/tool/read.rb', line 103

def initialize(workspace:)
  super(
    name: 'read',
    description: DESCRIPTION,
    parameters: Parameters.build { |p|
      p.required_string :path,
                        'Path to the file to read. Relative paths ' \
                        'resolve against the workspace root, e.g. ' \
                        '"lib/foo.rb" or "/abs/path/to/file.txt".'
      p.optional_integer :offset,
                         'Line number to start reading from (1-indexed). ' \
                         "Defaults to 1, e.g. 200."
      p.optional_integer :limit,
                         'Maximum number of lines to read. Defaults to ' \
                         "#{DEFAULT_LIMIT}, e.g. 500."
    },
    execute: ->(path:, offset: 1, limit: DEFAULT_LIMIT) {
      Read.read(workspace: workspace, path: path, offset: offset, limit: limit)
    }
  )
end

Class Method Details

.binary?(bytes) ⇒ `Boolean`

Heuristic binary classifier matching opencode’s: any NUL byte forces true; otherwise count bytes outside the printable t n v f r + ASCII-32..126 range and ratio against the sample size. UTF-8 continuation bytes (0x80-0xBF) are >127 so they sit outside the non-printable ranges and pass through unflagged, letting UTF-8 text read fine.

Public because Edit reuses it to refuse binary targets —if Edit accepted a binary file the model has no way to have read, it could corrupt bytes the model never inspected. Same sniff, same threshold, one definition.

Parameters:

bytes (String) —

sample bytes

Returns:

(Boolean)

# File 'lib/pikuri/tool/read.rb', line 177

def self.binary?(bytes)
  return false if bytes.empty?

  non_printable = 0
  bytes.each_byte do |b|
    return true if b.zero?

    non_printable += 1 if b < 9 || (b > 13 && b < 32)
  end
  non_printable.to_f / bytes.bytesize > BINARY_NONPRINTABLE_THRESHOLD
end

.read(workspace:, path:, offset:, limit:) ⇒ `String`

Resolve path against workspace, refuse directories / binaries / missing files, and return either the cat-n-formatted slice or an “Error: …” observation.

Parameters:

workspace (Tool::Workspace)
path (String) —

raw path as supplied by the LLM
offset (Integer) —

1-indexed line number to start at
limit (Integer) —

maximum lines to return

Returns:

(String) —

tool observation

# File 'lib/pikuri/tool/read.rb', line 134

def self.read(workspace:, path:, offset:, limit:)
  return "Error: offset must be >= 1, got #{offset}" if offset < 1
  return "Error: limit must be >= 1, got #{limit}"   if limit < 1

  resolved = workspace.resolve_for_read(path)
  return "Error: file not found: #{path}" unless resolved.exist?
  return "Error: #{path} is a directory; use the glob tool to list files." if resolved.directory?

  sample = read_sample(resolved)
  return "Error: cannot read binary file: #{path}" if binary?(sample)

  format_slice(path: path, resolved: resolved, offset: offset, limit: limit)
rescue Tool::Workspace::Error => e
  "Error: #{e.message}"
rescue Errno::EACCES => e
  "Error: cannot read #{path}: #{e.message}"
end

Class: Pikuri::Tool::Read

Overview

Output format

Truncation rules

Refusals

Constant Summary collapse

Constants inherited from Pikuri::Tool

Instance Attribute Summary

Attributes inherited from Pikuri::Tool

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Pikuri::Tool

Constructor Details

#initialize(workspace:) ⇒ Read

Class Method Details

.binary?(bytes) ⇒ Boolean

.read(workspace:, path:, offset:, limit:) ⇒ String

#initialize(workspace:) ⇒ `Read`

.binary?(bytes) ⇒ `Boolean`

.read(workspace:, path:, offset:, limit:) ⇒ `String`