Class: Rubino::Tools::ReadTracker
- Inherits:
-
Object
- Object
- Rubino::Tools::ReadTracker
- Defined in:
- lib/rubino/tools/read_tracker.rb
Overview
Single source of truth for per-path read/write state in a session, keyed on mtime. Edit / MultiEdit / Write consult it before writing so the model can’t edit a file it never opened (and would then be editing from training-time priors), and ReadTool consults it to skip re-emitting bytes already in context.
WHY hash AND mtime (not mtime alone): the agent’s OWN write bumps mtime, and so does a no-op ‘touch`, a CRLF normalisation, or a linter that rewrites the file to byte-identical content. mtime alone false-collides on all of those and trips the stale-read guard against the agent itself (r5 B2). We therefore record the content hash too: a path is “fresh” when EITHER the mtime is unchanged OR the on-disk content still hashes to what we last saw — so a touch / CRLF / linter rewrite to the same bytes does not force a re-read.
REFRESH-ON-OWN-WRITE (r5 B2): a successful write/edit records the NEW content+mtime here via #note_write, so the agent’s own writes are authoritative and the very next edit to the same file passes the gate instead of “changed on disk since last read”.
DEDUP + RECOVERY (r5 B3): the duplicate-read nudge must SKIP WORK but NEVER serve stale bytes. #duplicate_read? returns true only when the same window was read AND the file still hashes to what that read saw AND a short TTL has not elapsed AND no edit-failure recovery is pending for the path. A failed edit calls #note_edit_failure(path); the next read of that path always serves fresh content (the dedup is suppressed once).
Lifecycle: one instance PER SESSION (see .for_session), shared by every turn’s ToolExecutor in this process. Resume in a NEW process does NOT carry the tracker — the model must re-read after a resume before editing.
Constant Summary collapse
- DEDUP_TTL_SECONDS =
How long a duplicate-read nudge stays valid. Past this the model may legitimately want the bytes back in context (long turn, summarised away), so we serve the content again rather than nudge.
120
Class Method Summary collapse
Instance Method Summary collapse
-
#duplicate_read?(path, offset, limit, content_hash = nil) ⇒ Boolean
Records a read of an exact (path, offset, limit) window and reports whether this is a duplicate the model can reuse instead of re-reading.
-
#fresh?(path) ⇒ Boolean
True when the file on disk still matches what we last saw.
-
#initialize ⇒ ReadTracker
constructor
A new instance of ReadTracker.
- #mtime_at_read(path) ⇒ Object
-
#note_edit_failure(path) ⇒ Object
Flags that the last edit/multi_edit to
pathFAILED, so the model’s next read of it bypasses dedup and gets fresh disk content for recovery (r5 B3). -
#note_write(path, new_content, mtime = nil) ⇒ Object
Records the agent’s OWN successful write/edit: the new content is now authoritative, so the next edit must NOT trip the stale-read guard (r5 B2).
-
#register(path, mtime, content_hash = nil) ⇒ Object
Records a successful read: stash mtime + content hash so a later edit can confirm the file is unchanged, and a later read of the same window can be deduped.
- #seen?(path) ⇒ Boolean
Constructor Details
#initialize ⇒ ReadTracker
Returns a new instance of ReadTracker.
59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/rubino/tools/read_tracker.rb', line 59 def initialize # path => { mtime:, hash: } — the last state we KNOW for this path, # whether from a read or from the agent's own write. @state = {} # [path, offset, limit] => { hash:, at: } — windows already served, so # an identical re-read of unchanged bytes is a duplicate. @windows = {} # paths whose last edit failed: the next read bypasses dedup so a # recovery re-read always returns fresh content. @recover = {} @mutex = Mutex.new end |
Class Method Details
.for_session(session_id) ⇒ Object
47 48 49 50 51 52 |
# File 'lib/rubino/tools/read_tracker.rb', line 47 def for_session(session_id) key = session_id.to_s return new if key.empty? @registry_mutex.synchronize { @registry[key] ||= new } end |
.reset! ⇒ Object
54 55 56 |
# File 'lib/rubino/tools/read_tracker.rb', line 54 def reset! @registry_mutex.synchronize { @registry = {} } end |
Instance Method Details
#duplicate_read?(path, offset, limit, content_hash = nil) ⇒ Boolean
Records a read of an exact (path, offset, limit) window and reports whether this is a duplicate the model can reuse instead of re-reading. It is a duplicate ONLY when: the same window was served before, the file still hashes to what that window saw, the TTL hasn’t elapsed, AND no edit-failure recovery is pending for the path. Otherwise it records the fresh window and returns false (serve the content).
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
# File 'lib/rubino/tools/read_tracker.rb', line 156 def duplicate_read?(path, offset, limit, content_hash = nil) key = canonical(path) return false unless key digest = content_hash || hash_of(key) sig = [key, offset.to_i, limit.to_i] @mutex.synchronize do # A pending recovery (prior edit failed) always serves fresh content # once, then clears. if @recover.delete(key) @windows[sig] = { hash: digest, at: monotonic } next false end prior = @windows[sig] if prior && prior[:hash] == digest && (monotonic - prior[:at]) <= DEDUP_TTL_SECONDS true else @windows[sig] = { hash: digest, at: monotonic } false end end end |
#fresh?(path) ⇒ Boolean
True when the file on disk still matches what we last saw. The content hash is AUTHORITATIVE for change-detection: we never trust mtime alone to declare freshness, because on a coarse-mtime filesystem (Docker/linuxkit VM, some network mounts, two rapid consecutive writes) an external content change can land WITHOUT the mtime advancing — trusting mtime <= stored there would let an edit proceed on stale bytes and clobber the external change. So mtime is at most a hint: a NEWER mtime means recheck; an equal/older mtime still falls through to a hash comparison. The hash arm also lets a no-op touch / CRLF / linter rewrite to identical bytes pass without forcing a re-read (r5 B2). Returns false when we never saw the file, or it genuinely changed on disk.
129 130 131 132 133 134 135 136 137 138 139 140 141 |
# File 'lib/rubino/tools/read_tracker.rb', line 129 def fresh?(path) key = canonical(path) return false unless key @mutex.synchronize do state = @state[key] next false unless state # Content hash is authoritative: equal/older mtime does NOT prove # freshness on a coarse-mtime FS, so always confirm via the hash. state[:hash] && state[:hash] == hash_of(key) end end |
#mtime_at_read(path) ⇒ Object
143 144 145 146 147 148 |
# File 'lib/rubino/tools/read_tracker.rb', line 143 def mtime_at_read(path) key = canonical(path) return nil unless key @mutex.synchronize { @state[key]&.fetch(:mtime, nil) } end |
#note_edit_failure(path) ⇒ Object
Flags that the last edit/multi_edit to path FAILED, so the model’s next read of it bypasses dedup and gets fresh disk content for recovery (r5 B3). One-shot: consumed by the next duplicate_read? check.
104 105 106 107 108 109 |
# File 'lib/rubino/tools/read_tracker.rb', line 104 def note_edit_failure(path) key = canonical(path) return unless key @mutex.synchronize { @recover[key] = true } end |
#note_write(path, new_content, mtime = nil) ⇒ Object
Records the agent’s OWN successful write/edit: the new content is now authoritative, so the next edit must NOT trip the stale-read guard (r5 B2). Pass the bytes just written so we hash exactly those and don’t re-read the file (which could race a concurrent writer).
88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/rubino/tools/read_tracker.rb', line 88 def note_write(path, new_content, mtime = nil) key = canonical(path) return unless key @mutex.synchronize do @state[key] = { mtime: mtime || file_mtime(key), hash: hash_bytes(new_content) } # An applied write is the freshest possible content — clear any # pending recovery flag and stale window records for this path. @recover.delete(key) @windows.reject! { |(wpath, _o, _l), _v| wpath == key } end end |
#register(path, mtime, content_hash = nil) ⇒ Object
Records a successful read: stash mtime + content hash so a later edit can confirm the file is unchanged, and a later read of the same window can be deduped.
75 76 77 78 79 80 81 82 |
# File 'lib/rubino/tools/read_tracker.rb', line 75 def register(path, mtime, content_hash = nil) key = canonical(path) return unless key @mutex.synchronize do @state[key] = { mtime: mtime, hash: content_hash || hash_of(key) } end end |
#seen?(path) ⇒ Boolean
111 112 113 114 115 116 |
# File 'lib/rubino/tools/read_tracker.rb', line 111 def seen?(path) key = canonical(path) return false unless key @mutex.synchronize { @state.key?(key) } end |