Class: Woods::Unblocked::SyncManifest
- Inherits:
-
Object
- Object
- Woods::Unblocked::SyncManifest
- Defined in:
- lib/woods/unblocked/sync_manifest.rb
Overview
Tracks what was last pushed to an Unblocked collection so a sync can skip unchanged documents, re-push changed ones, and delete orphans.
The manifest is the local source of truth for change detection: each entry records the content hash of the document we last pushed for a URI plus the remote document_id (needed for deletes). Persisted as JSON alongside the extraction output and restored across CI runs via the CI provider’s cache. A missing or corrupt file degrades to “everything is new” — a correct (if expensive) full sync that rebuilds the manifest.
Modeled on the embedding indexer’s checkpoint (load JSON → compare per-key hash → save JSON).
Constant Summary collapse
- VERSION =
1
Instance Method Summary collapse
-
#document_id_for(uri) ⇒ String?
Stored remote document_id, if known.
-
#empty? ⇒ Boolean
True when no documents are recorded.
-
#forget(uri) ⇒ Object
Drop a URI from the manifest (after a successful remote delete).
-
#initialize(path:, collection_id:) ⇒ SyncManifest
constructor
A new instance of SyncManifest.
-
#record(uri:, hash:, document_id:) ⇒ Object
Record (or update) what we pushed for a URI.
-
#save ⇒ Object
Persist the manifest atomically (temp file + rename) so an interrupted write never leaves a torn file in the CI cache.
-
#size ⇒ Integer
Number of recorded documents.
-
#stale_uris(current_uris) ⇒ Array<String>
URIs we have a record of that are absent from the current run’s set.
-
#unchanged?(uri, hash) ⇒ Boolean
True when the recorded hash matches (safe to skip).
Constructor Details
#initialize(path:, collection_id:) ⇒ SyncManifest
Returns a new instance of SyncManifest.
34 35 36 37 38 |
# File 'lib/woods/unblocked/sync_manifest.rb', line 34 def initialize(path:, collection_id:) @path = path @collection_id = collection_id @documents = load end |
Instance Method Details
#document_id_for(uri) ⇒ String?
Returns Stored remote document_id, if known.
64 65 66 |
# File 'lib/woods/unblocked/sync_manifest.rb', line 64 def document_id_for(uri) @documents.dig(uri, 'document_id') end |
#empty? ⇒ Boolean
Returns true when no documents are recorded.
41 42 43 |
# File 'lib/woods/unblocked/sync_manifest.rb', line 41 def empty? @documents.empty? end |
#forget(uri) ⇒ Object
Drop a URI from the manifest (after a successful remote delete).
85 86 87 |
# File 'lib/woods/unblocked/sync_manifest.rb', line 85 def forget(uri) @documents.delete(uri) end |
#record(uri:, hash:, document_id:) ⇒ Object
Record (or update) what we pushed for a URI.
58 59 60 |
# File 'lib/woods/unblocked/sync_manifest.rb', line 58 def record(uri:, hash:, document_id:) @documents[uri] = { 'hash' => hash, 'document_id' => document_id } end |
#save ⇒ Object
Persist the manifest atomically (temp file + rename) so an interrupted write never leaves a torn file in the CI cache.
91 92 93 94 95 96 97 98 99 100 101 |
# File 'lib/woods/unblocked/sync_manifest.rb', line 91 def save FileUtils.mkdir_p(File.dirname(@path)) payload = JSON.generate( 'version' => VERSION, 'collection_id' => @collection_id, 'documents' => @documents ) tmp = "#{@path}.tmp" File.write(tmp, payload) File.rename(tmp, @path) end |
#size ⇒ Integer
Returns number of recorded documents.
78 79 80 |
# File 'lib/woods/unblocked/sync_manifest.rb', line 78 def size @documents.size end |
#stale_uris(current_uris) ⇒ Array<String>
URIs we have a record of that are absent from the current run’s set.
72 73 74 75 |
# File 'lib/woods/unblocked/sync_manifest.rb', line 72 def stale_uris(current_uris) present = current_uris.to_a @documents.keys - present end |
#unchanged?(uri, hash) ⇒ Boolean
Returns true when the recorded hash matches (safe to skip).
48 49 50 51 |
# File 'lib/woods/unblocked/sync_manifest.rb', line 48 def unchanged?(uri, hash) entry = @documents[uri] !entry.nil? && entry['hash'] == hash end |