Class: Rubino::Tools::ReadTracker
- Inherits:
-
Object
- Object
- Rubino::Tools::ReadTracker
- Defined in:
- lib/rubino/tools/read_tracker.rb
Overview
Single source of truth for per-path read/write state in a session, keyed on mtime. Edit / MultiEdit / Write consult it before writing so the model can’t edit a file it never opened (and would then be editing from training-time priors), and ReadTool consults it to skip re-emitting bytes already in context.
WHY hash AND mtime (not mtime alone): the agent’s OWN write bumps mtime, and so does a no-op ‘touch`, a CRLF normalisation, or a linter that rewrites the file to byte-identical content. mtime alone false-collides on all of those and trips the stale-read guard against the agent itself (r5 B2). We therefore record the content hash too: a path is “fresh” when EITHER the mtime is unchanged OR the on-disk content still hashes to what we last saw — so a touch / CRLF / linter rewrite to the same bytes does not force a re-read.
REFRESH-ON-OWN-WRITE (r5 B2): a successful write/edit records the NEW content+mtime here via #note_write, so the agent’s own writes are authoritative and the very next edit to the same file passes the gate instead of “changed on disk since last read”.
DEDUP + RECOVERY (r5 B3): the duplicate-read nudge must SKIP WORK but NEVER serve stale bytes. #duplicate_read? returns true only when the same window was read AND the file still hashes to what that read saw AND a short TTL has not elapsed AND no edit-failure recovery is pending for the path. A failed edit calls #note_edit_failure(path); the next read of that path always serves fresh content (the dedup is suppressed once).
Lifecycle: one instance PER SESSION (see .for_session), shared by every turn’s ToolExecutor in this process. Resume in a NEW process does NOT carry the tracker — the model must re-read after a resume before editing.
Constant Summary collapse
- DEDUP_TTL_SECONDS =
How long a duplicate-read nudge stays valid. Past this the model may legitimately want the bytes back in context (long turn, summarised away), so we serve the content again rather than nudge.
120
Class Method Summary collapse
Instance Method Summary collapse
-
#drill_in?(path, offset, limit) ⇒ Boolean
True when a TARGETED read window [offset, offset+limit-1] of
pathoverlaps any range we previously elided in a skeleton of that path — i.e. -
#duplicate_read?(path, offset, limit, content_hash = nil) ⇒ Boolean
Records a read of an exact (path, offset, limit) window and reports whether this is a duplicate the model can reuse instead of re-reading.
-
#fresh?(path) ⇒ Boolean
True when the file on disk still matches what we last saw.
-
#initialize ⇒ ReadTracker
constructor
A new instance of ReadTracker.
- #mtime_at_read(path) ⇒ Object
-
#note_edit_failure(path) ⇒ Object
Flags that the last edit/multi_edit to
pathFAILED, so the model’s next read of it bypasses dedup and gets fresh disk content for recovery (r5 B3). -
#note_skeleton(path, ranges) ⇒ Object
Records that
pathwas sent as a skeleton with these elided ranges (each [first_line, line_count]). -
#note_write(path, new_content, mtime = nil) ⇒ Object
Records the agent’s OWN successful write/edit: the new content is now authoritative, so the next edit must NOT trip the stale-read guard (r5 B2).
-
#register(path, mtime, content_hash = nil) ⇒ Object
Records a successful read: stash mtime + content hash so a later edit can confirm the file is unchanged, and a later read of the same window can be deduped.
- #seen?(path) ⇒ Boolean
Constructor Details
#initialize ⇒ ReadTracker
Returns a new instance of ReadTracker.
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/rubino/tools/read_tracker.rb', line 59 def initialize # path => { mtime:, hash: } — the last state we KNOW for this path, # whether from a read or from the agent's own write. @state = {} # [path, offset, limit] => { hash:, at: } — windows already served, so # an identical re-read of unchanged bytes is a duplicate. @windows = {} # paths whose last edit failed: the next read bypasses dedup so a # recovery re-read always returns fresh content. @recover = {} # COMPRESSION drill-in tracking (tool_output_compression). For each path # we sent as a skeleton: the elided [first_line, line_count] ranges, so a # later TARGETED read landing inside one is a "drill-in" — the signal # that the skeleton hid a body the model then needed. @skeletons = {} @mutex = Mutex.new end |
Class Method Details
.for_session(session_id) ⇒ Object
47 48 49 50 51 52 |
# File 'lib/rubino/tools/read_tracker.rb', line 47 def for_session(session_id) key = session_id.to_s return new if key.empty? @registry_mutex.synchronize { @registry[key] ||= new } end |
.reset! ⇒ Object
54 55 56 |
# File 'lib/rubino/tools/read_tracker.rb', line 54 def reset! @registry_mutex.synchronize { @registry = {} } end |
Instance Method Details
#drill_in?(path, offset, limit) ⇒ Boolean
True when a TARGETED read window [offset, offset+limit-1] of path overlaps any range we previously elided in a skeleton of that path — i.e. the model is drilling into a body the skeleton hid. Read-only.
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/rubino/tools/read_tracker.rb', line 131 def drill_in?(path, offset, limit) key = canonical(path) return false unless key win_start = offset.to_i win_end = win_start + limit.to_i - 1 @mutex.synchronize do ranges = @skeletons[key] next false unless ranges ranges.any? do |first, count| first <= win_end && (first + count - 1) >= win_start end end end |
#duplicate_read?(path, offset, limit, content_hash = nil) ⇒ Boolean
Records a read of an exact (path, offset, limit) window and reports whether this is a duplicate the model can reuse instead of re-reading. It is a duplicate ONLY when: the same window was served before, the file still hashes to what that window saw, the TTL hasn’t elapsed, AND no edit-failure recovery is pending for the path. Otherwise it records the fresh window and returns false (serve the content).
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# File 'lib/rubino/tools/read_tracker.rb', line 192 def duplicate_read?(path, offset, limit, content_hash = nil) key = canonical(path) return false unless key digest = content_hash || hash_of(key) sig = [key, offset.to_i, limit.to_i] @mutex.synchronize do # A pending recovery (prior edit failed) always serves fresh content # once, then clears. if @recover.delete(key) @windows[sig] = { hash: digest, at: monotonic } next false end prior = @windows[sig] if prior && prior[:hash] == digest && (monotonic - prior[:at]) <= DEDUP_TTL_SECONDS true else @windows[sig] = { hash: digest, at: monotonic } false end end end |
#fresh?(path) ⇒ Boolean
True when the file on disk still matches what we last saw. The content hash is AUTHORITATIVE for change-detection: we never trust mtime alone to declare freshness, because on a coarse-mtime filesystem (Docker/linuxkit VM, some network mounts, two rapid consecutive writes) an external content change can land WITHOUT the mtime advancing — trusting mtime <= stored there would let an edit proceed on stale bytes and clobber the external change. So mtime is at most a hint: a NEWER mtime means recheck; an equal/older mtime still falls through to a hash comparison. The hash arm also lets a no-op touch / CRLF / linter rewrite to identical bytes pass without forcing a re-read (r5 B2). Returns false when we never saw the file, or it genuinely changed on disk.
165 166 167 168 169 170 171 172 173 174 175 176 177 |
# File 'lib/rubino/tools/read_tracker.rb', line 165 def fresh?(path) key = canonical(path) return false unless key @mutex.synchronize do state = @state[key] next false unless state # Content hash is authoritative: equal/older mtime does NOT prove # freshness on a coarse-mtime FS, so always confirm via the hash. state[:hash] && state[:hash] == hash_of(key) end end |
#mtime_at_read(path) ⇒ Object
179 180 181 182 183 184 |
# File 'lib/rubino/tools/read_tracker.rb', line 179 def mtime_at_read(path) key = canonical(path) return nil unless key @mutex.synchronize { @state[key]&.fetch(:mtime, nil) } end |
#note_edit_failure(path) ⇒ Object
Flags that the last edit/multi_edit to path FAILED, so the model’s next read of it bypasses dedup and gets fresh disk content for recovery (r5 B3). One-shot: consumed by the next duplicate_read? check.
109 110 111 112 113 114 |
# File 'lib/rubino/tools/read_tracker.rb', line 109 def note_edit_failure(path) key = canonical(path) return unless key @mutex.synchronize { @recover[key] = true } end |
#note_skeleton(path, ranges) ⇒ Object
Records that path was sent as a skeleton with these elided ranges (each [first_line, line_count]). Replaces any prior record for the path (a re-read re-skeletons from scratch).
119 120 121 122 123 124 125 126 |
# File 'lib/rubino/tools/read_tracker.rb', line 119 def note_skeleton(path, ranges) key = canonical(path) return unless key @mutex.synchronize do @skeletons[key] = ranges end end |
#note_write(path, new_content, mtime = nil) ⇒ Object
Records the agent’s OWN successful write/edit: the new content is now authoritative, so the next edit must NOT trip the stale-read guard (r5 B2). Pass the bytes just written so we hash exactly those and don’t re-read the file (which could race a concurrent writer).
93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/rubino/tools/read_tracker.rb', line 93 def note_write(path, new_content, mtime = nil) key = canonical(path) return unless key @mutex.synchronize do @state[key] = { mtime: mtime || file_mtime(key), hash: hash_bytes(new_content) } # An applied write is the freshest possible content — clear any # pending recovery flag and stale window records for this path. @recover.delete(key) @windows.reject! { |(wpath, _o, _l), _v| wpath == key } end end |
#register(path, mtime, content_hash = nil) ⇒ Object
Records a successful read: stash mtime + content hash so a later edit can confirm the file is unchanged, and a later read of the same window can be deduped.
80 81 82 83 84 85 86 87 |
# File 'lib/rubino/tools/read_tracker.rb', line 80 def register(path, mtime, content_hash = nil) key = canonical(path) return unless key @mutex.synchronize do @state[key] = { mtime: mtime, hash: content_hash || hash_of(key) } end end |
#seen?(path) ⇒ Boolean
147 148 149 150 151 152 |
# File 'lib/rubino/tools/read_tracker.rb', line 147 def seen?(path) key = canonical(path) return false unless key @mutex.synchronize { @state.key?(key) } end |