Class: Woods::Unblocked::SyncManifest

Inherits:
Object
  • Object
show all
Defined in:
lib/woods/unblocked/sync_manifest.rb

Overview

Tracks what was last pushed to an Unblocked collection so a sync can skip unchanged documents, re-push changed ones, and delete orphans.

The manifest is the local source of truth for change detection: each entry records the content hash of the document we last pushed for a URI plus the remote document_id (needed for deletes). Persisted as JSON alongside the extraction output and restored across CI runs via the CI provider’s cache. A missing or corrupt file degrades to “everything is new” — a correct (if expensive) full sync that rebuilds the manifest.

Modeled on the embedding indexer’s checkpoint (load JSON → compare per-key hash → save JSON).

Examples:

manifest = SyncManifest.new(path: "tmp/woods/unblocked_sync_manifest.json",
                            collection_id: "col-uuid")
manifest.unchanged?(uri, hash)  # => false on first run
manifest.record(uri:, hash:, document_id:)
manifest.save

Constant Summary collapse

VERSION =
1

Instance Method Summary collapse

Constructor Details

#initialize(path:, collection_id:) ⇒ SyncManifest

Returns a new instance of SyncManifest.

Parameters:

  • path (String)

    JSON file path for the manifest

  • collection_id (String)

    Target collection UUID — a stored manifest for a different collection is discarded (cache-key reuse guard).



34
35
36
37
38
# File 'lib/woods/unblocked/sync_manifest.rb', line 34

def initialize(path:, collection_id:)
  @path = path
  @collection_id = collection_id
  @documents = load
end

Instance Method Details

#document_id_for(uri) ⇒ String?

Returns Stored remote document_id, if known.

Parameters:

  • uri (String)

    Document URI

Returns:

  • (String, nil)

    Stored remote document_id, if known



64
65
66
# File 'lib/woods/unblocked/sync_manifest.rb', line 64

def document_id_for(uri)
  @documents.dig(uri, 'document_id')
end

#empty?Boolean

Returns true when no documents are recorded.

Returns:

  • (Boolean)

    true when no documents are recorded



41
42
43
# File 'lib/woods/unblocked/sync_manifest.rb', line 41

def empty?
  @documents.empty?
end

#forget(uri) ⇒ Object

Drop a URI from the manifest (after a successful remote delete).

Parameters:

  • uri (String)

    Document URI



85
86
87
# File 'lib/woods/unblocked/sync_manifest.rb', line 85

def forget(uri)
  @documents.delete(uri)
end

#record(uri:, hash:, document_id:) ⇒ Object

Record (or update) what we pushed for a URI.

Parameters:

  • uri (String)

    Document URI

  • hash (String, nil)

    Content hash pushed (nil forces a future re-push)

  • document_id (String, nil)

    Remote document UUID (for later deletes)



58
59
60
# File 'lib/woods/unblocked/sync_manifest.rb', line 58

def record(uri:, hash:, document_id:)
  @documents[uri] = { 'hash' => hash, 'document_id' => document_id }
end

#saveObject

Persist the manifest atomically (temp file + rename) so an interrupted write never leaves a torn file in the CI cache.



91
92
93
94
95
96
97
98
99
100
101
# File 'lib/woods/unblocked/sync_manifest.rb', line 91

def save
  FileUtils.mkdir_p(File.dirname(@path))
  payload = JSON.generate(
    'version' => VERSION,
    'collection_id' => @collection_id,
    'documents' => @documents
  )
  tmp = "#{@path}.tmp"
  File.write(tmp, payload)
  File.rename(tmp, @path)
end

#sizeInteger

Returns number of recorded documents.

Returns:

  • (Integer)

    number of recorded documents



78
79
80
# File 'lib/woods/unblocked/sync_manifest.rb', line 78

def size
  @documents.size
end

#stale_uris(current_uris) ⇒ Array<String>

URIs we have a record of that are absent from the current run’s set.

Parameters:

  • current_uris (Array<String>, Set)

    URIs that still exist this run

Returns:

  • (Array<String>)

    recorded URIs no longer present (deletion candidates)



72
73
74
75
# File 'lib/woods/unblocked/sync_manifest.rb', line 72

def stale_uris(current_uris)
  present = current_uris.to_a
  @documents.keys - present
end

#unchanged?(uri, hash) ⇒ Boolean

Returns true when the recorded hash matches (safe to skip).

Parameters:

  • uri (String)

    Document URI

  • hash (String)

    Content hash of the document we would push now

Returns:

  • (Boolean)

    true when the recorded hash matches (safe to skip)



48
49
50
51
# File 'lib/woods/unblocked/sync_manifest.rb', line 48

def unchanged?(uri, hash)
  entry = @documents[uri]
  !entry.nil? && entry['hash'] == hash
end