Class: Rubino::Tools::ReadTracker

Inherits:

Object

Object
Rubino::Tools::ReadTracker

show all

Defined in:: lib/rubino/tools/read_tracker.rb

Overview

Single source of truth for per-path read/write state in a session, keyed on mtime. Edit / MultiEdit / Write consult it before writing so the model can’t edit a file it never opened (and would then be editing from training-time priors), and ReadTool consults it to skip re-emitting bytes already in context.

WHY hash AND mtime (not mtime alone): the agent’s OWN write bumps mtime, and so does a no-op ‘touch`, a CRLF normalisation, or a linter that rewrites the file to byte-identical content. mtime alone false-collides on all of those and trips the stale-read guard against the agent itself (r5 B2). We therefore record the content hash too: a path is “fresh” when EITHER the mtime is unchanged OR the on-disk content still hashes to what we last saw — so a touch / CRLF / linter rewrite to the same bytes does not force a re-read.

REFRESH-ON-OWN-WRITE (r5 B2): a successful write/edit records the NEW content+mtime here via #note_write, so the agent’s own writes are authoritative and the very next edit to the same file passes the gate instead of “changed on disk since last read”.

DEDUP + RECOVERY (r5 B3): the duplicate-read nudge must SKIP WORK but NEVER serve stale bytes. #duplicate_read? returns true only when the same window was read AND the file still hashes to what that read saw AND a short TTL has not elapsed AND no edit-failure recovery is pending for the path. A failed edit calls #note_edit_failure(path); the next read of that path always serves fresh content (the dedup is suppressed once).

Lifecycle: one instance PER SESSION (see .for_session), shared by every turn’s ToolExecutor in this process. Resume in a NEW process does NOT carry the tracker — the model must re-read after a resume before editing.

Constant Summary collapse

DEDUP_TTL_SECONDS = How long a duplicate-read nudge stays valid. Past this the model may legitimately want the bytes back in context (long turn, summarised away), so we serve the content again rather than nudge.

Instance Method Summary collapse

#drill_in?(path, offset, limit) ⇒ Boolean

True when a TARGETED read window [offset, offset+limit-1] of path overlaps any range we previously elided in a skeleton of that path — i.e.
#duplicate_read?(path, offset, limit, content_hash = nil) ⇒ Boolean

Records a read of an exact (path, offset, limit) window and reports whether this is a duplicate the model can reuse instead of re-reading.
#fresh?(path) ⇒ Boolean

True when the file on disk still matches what we last saw.
#initialize ⇒ ReadTracker constructor

A new instance of ReadTracker.
#mtime_at_read(path) ⇒ Object
#note_edit_failure(path) ⇒ Object

Flags that the last edit/multi_edit to path FAILED, so the model’s next read of it bypasses dedup and gets fresh disk content for recovery (r5 B3).
#note_skeleton(path, ranges) ⇒ Object

Records that path was sent as a skeleton with these elided ranges (each [first_line, line_count]).
#note_write(path, new_content, mtime = nil) ⇒ Object

Records the agent’s OWN successful write/edit: the new content is now authoritative, so the next edit must NOT trip the stale-read guard (r5 B2).
#register(path, mtime, content_hash = nil) ⇒ Object

Records a successful read: stash mtime + content hash so a later edit can confirm the file is unchanged, and a later read of the same window can be deduped.
#seen?(path) ⇒ Boolean

Constructor Details

#initialize ⇒ `ReadTracker`

Returns a new instance of ReadTracker.

# File 'lib/rubino/tools/read_tracker.rb', line 59

def initialize
  # path => { mtime:, hash: } — the last state we KNOW for this path,
  # whether from a read or from the agent's own write.
  @state = {}
  # [path, offset, limit] => { hash:, at: } — windows already served, so
  # an identical re-read of unchanged bytes is a duplicate.
  @windows = {}
  # paths whose last edit failed: the next read bypasses dedup so a
  # recovery re-read always returns fresh content.
  @recover = {}
  # COMPRESSION drill-in tracking (tool_output_compression). For each path
  # we sent as a skeleton: the elided [first_line, line_count] ranges, so a
  # later TARGETED read landing inside one is a "drill-in" — the signal
  # that the skeleton hid a body the model then needed.
  @skeletons = {}
  @mutex = Mutex.new
end

Class Method Details

.for_session(session_id) ⇒ `Object`

# File 'lib/rubino/tools/read_tracker.rb', line 47

def for_session(session_id)
  key = session_id.to_s
  return new if key.empty?

  @registry_mutex.synchronize { @registry[key] ||= new }
end

.reset! ⇒ `Object`



54
55
56

# File 'lib/rubino/tools/read_tracker.rb', line 54

def reset!
  @registry_mutex.synchronize { @registry = {} }
end

Instance Method Details

#drill_in?(path, offset, limit) ⇒ `Boolean`

True when a TARGETED read window [offset, offset+limit-1] of path overlaps any range we previously elided in a skeleton of that path — i.e. the model is drilling into a body the skeleton hid. Read-only.

Returns:

(Boolean)

# File 'lib/rubino/tools/read_tracker.rb', line 131

def drill_in?(path, offset, limit)
  key = canonical(path)
  return false unless key

  win_start = offset.to_i
  win_end   = win_start + limit.to_i - 1
  @mutex.synchronize do
    ranges = @skeletons[key]
    next false unless ranges

    ranges.any? do |first, count|
      first <= win_end && (first + count - 1) >= win_start
    end
  end
end

#duplicate_read?(path, offset, limit, content_hash = nil) ⇒ `Boolean`

Records a read of an exact (path, offset, limit) window and reports whether this is a duplicate the model can reuse instead of re-reading. It is a duplicate ONLY when: the same window was served before, the file still hashes to what that window saw, the TTL hasn’t elapsed, AND no edit-failure recovery is pending for the path. Otherwise it records the fresh window and returns false (serve the content).

Returns:

(Boolean)

# File 'lib/rubino/tools/read_tracker.rb', line 192

def duplicate_read?(path, offset, limit, content_hash = nil)
  key = canonical(path)
  return false unless key

  digest = content_hash || hash_of(key)
  sig = [key, offset.to_i, limit.to_i]

  @mutex.synchronize do
    # A pending recovery (prior edit failed) always serves fresh content
    # once, then clears.
    if @recover.delete(key)
      @windows[sig] = { hash: digest, at: monotonic }
      next false
    end

    prior = @windows[sig]
    if prior && prior[:hash] == digest && (monotonic - prior[:at]) <= DEDUP_TTL_SECONDS
      true
    else
      @windows[sig] = { hash: digest, at: monotonic }
      false
    end
  end
end

#fresh?(path) ⇒ `Boolean`

True when the file on disk still matches what we last saw. The content hash is AUTHORITATIVE for change-detection: we never trust mtime alone to declare freshness, because on a coarse-mtime filesystem (Docker/linuxkit VM, some network mounts, two rapid consecutive writes) an external content change can land WITHOUT the mtime advancing — trusting mtime <= stored there would let an edit proceed on stale bytes and clobber the external change. So mtime is at most a hint: a NEWER mtime means recheck; an equal/older mtime still falls through to a hash comparison. The hash arm also lets a no-op touch / CRLF / linter rewrite to identical bytes pass without forcing a re-read (r5 B2). Returns false when we never saw the file, or it genuinely changed on disk.

Returns:

(Boolean)

# File 'lib/rubino/tools/read_tracker.rb', line 165

def fresh?(path)
  key = canonical(path)
  return false unless key

  @mutex.synchronize do
    state = @state[key]
    next false unless state

    # Content hash is authoritative: equal/older mtime does NOT prove
    # freshness on a coarse-mtime FS, so always confirm via the hash.
    state[:hash] && state[:hash] == hash_of(key)
  end
end

#mtime_at_read(path) ⇒ `Object`

# File 'lib/rubino/tools/read_tracker.rb', line 179

def mtime_at_read(path)
  key = canonical(path)
  return nil unless key

  @mutex.synchronize { @state[key]&.fetch(:mtime, nil) }
end

#note_edit_failure(path) ⇒ `Object`

Flags that the last edit/multi_edit to path FAILED, so the model’s next read of it bypasses dedup and gets fresh disk content for recovery (r5 B3). One-shot: consumed by the next duplicate_read? check.

# File 'lib/rubino/tools/read_tracker.rb', line 109

def note_edit_failure(path)
  key = canonical(path)
  return unless key

  @mutex.synchronize { @recover[key] = true }
end

#note_skeleton(path, ranges) ⇒ `Object`

Records that path was sent as a skeleton with these elided ranges (each [first_line, line_count]). Replaces any prior record for the path (a re-read re-skeletons from scratch).

# File 'lib/rubino/tools/read_tracker.rb', line 119

def note_skeleton(path, ranges)
  key = canonical(path)
  return unless key

  @mutex.synchronize do
    @skeletons[key] = ranges
  end
end

#note_write(path, new_content, mtime = nil) ⇒ `Object`

Records the agent’s OWN successful write/edit: the new content is now authoritative, so the next edit must NOT trip the stale-read guard (r5 B2). Pass the bytes just written so we hash exactly those and don’t re-read the file (which could race a concurrent writer).

# File 'lib/rubino/tools/read_tracker.rb', line 93

def note_write(path, new_content, mtime = nil)
  key = canonical(path)
  return unless key

  @mutex.synchronize do
    @state[key] = { mtime: mtime || file_mtime(key), hash: hash_bytes(new_content) }
    # An applied write is the freshest possible content — clear any
    # pending recovery flag and stale window records for this path.
    @recover.delete(key)
    @windows.reject! { |(wpath, _o, _l), _v| wpath == key }
  end
end

#register(path, mtime, content_hash = nil) ⇒ `Object`

Records a successful read: stash mtime + content hash so a later edit can confirm the file is unchanged, and a later read of the same window can be deduped.

# File 'lib/rubino/tools/read_tracker.rb', line 80

def register(path, mtime, content_hash = nil)
  key = canonical(path)
  return unless key

  @mutex.synchronize do
    @state[key] = { mtime: mtime, hash: content_hash || hash_of(key) }
  end
end

#seen?(path) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/rubino/tools/read_tracker.rb', line 147

def seen?(path)
  key = canonical(path)
  return false unless key

  @mutex.synchronize { @state.key?(key) }
end

Class: Rubino::Tools::ReadTracker

Overview

Constant Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ ReadTracker

Class Method Details

.for_session(session_id) ⇒ Object

.reset! ⇒ Object

Instance Method Details

#drill_in?(path, offset, limit) ⇒ Boolean

#duplicate_read?(path, offset, limit, content_hash = nil) ⇒ Boolean

#fresh?(path) ⇒ Boolean

#mtime_at_read(path) ⇒ Object

#note_edit_failure(path) ⇒ Object

#note_skeleton(path, ranges) ⇒ Object

#note_write(path, new_content, mtime = nil) ⇒ Object

#register(path, mtime, content_hash = nil) ⇒ Object

#seen?(path) ⇒ Boolean

#initialize ⇒ `ReadTracker`

.for_session(session_id) ⇒ `Object`

.reset! ⇒ `Object`

#drill_in?(path, offset, limit) ⇒ `Boolean`

#duplicate_read?(path, offset, limit, content_hash = nil) ⇒ `Boolean`

#fresh?(path) ⇒ `Boolean`

#mtime_at_read(path) ⇒ `Object`

#note_edit_failure(path) ⇒ `Object`

#note_skeleton(path, ranges) ⇒ `Object`

#note_write(path, new_content, mtime = nil) ⇒ `Object`

#register(path, mtime, content_hash = nil) ⇒ `Object`

#seen?(path) ⇒ `Boolean`