Class: RSpecTracer::RemoteCache::S3Backend Private

Inherits:
Object
  • Object
show all
Defined in:
lib/rspec_tracer/remote_cache/s3_backend.rb

Overview

This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.

S3 implementation of ‘RemoteCache::Backend`. Shells out to the `aws` / `awslocal` CLI for every operation - matches 1.x’s behavior and avoids pulling ‘aws-sdk-s3` into the gem’s runtime deps. Users on 1.x already have ‘aws` on PATH per the documented CI recipe; 2.0 asks nothing new.

Two-tier S3 layout (change from 1.x flat layout; paired with the schema_version bump - one cold run on upgrade). Cache payload is a single ‘cache.tar.gz` per ref (~15 JSON files + last_run.json packed together; ~4-6x smaller on the wire + 1 GET per download instead of 15):

s3://<bucket>/<prefix>/
  main/<sha>/[<test_suite_id>/]cache.tar.gz
  pr/<branch>/<sha>/[<test_suite_id>/]cache.tar.gz
  pr/<branch>/branch_refs.json

Local cache_path layout is unchanged - the archive is a transit boundary only. Users and external tooling continue to see the 15-file disk layout documented in ‘USER_FACING_SURFACE.md` section 6.

Tier is determined from ‘branch` vs `default_branch` at construction. Main-branch builds write to main tier; PR builds write to their branch-scoped pr tier. Download tries the backend’s own tier first, then falls back to main tier for the same ref (catches PRs cherry-picking from main).

Retention (closes issue #20 at the architectural layer, not just with a knob):

- `cache_retention_count N` keeps newest N refs per tier
  (main has N refs, each PR branch has N refs).
- `cache_retention_duration_seconds X` prunes refs older than
  X seconds in any tier the backend visits.
- `cache_retention_pr_branch_ttl_seconds X` deletes a PR branch
  entirely (including its branch_refs.json) when no ref has
  been touched in X seconds. Applied at upload time in the
  backend's own branch only; cross-branch cleanup is a separate
  Rake task.

Graceful-degradation contract:

- `download` returns false and never raises on wire/validation
  failure. Partial downloads are cleaned up.
- `upload` raises on wire failure; the Rake task catches.
- `branch_refs` returns `{}` on missing file.
- `prune!` returns count removed, never raises.

S3 shells out via ‘aws` CLI - a single class is the natural unit of composition here. The class is large; splitting would be cosmetic. rubocop:disable Metrics/ClassLength

Defined Under Namespace

Classes: S3BackendError

Constant Summary collapse

MAIN_TIER =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

'main'
PR_TIER =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

'pr'
BRANCH_REFS_FILENAME =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

'branch_refs.json'
LAST_RUN_FILENAME =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

'last_run.json'
CACHE_ARCHIVE_FILENAME =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

Archive::CACHE_FILENAME
ENCODING =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

'UTF-8'
REQUIRED_OPTS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

%i[bucket prefix branch default_branch cache_path].freeze

Instance Method Summary collapse

Constructor Details

#initialize(bucket:, prefix:, branch:, default_branch:, cache_path:, test_suite_id: nil, local: false, logger: nil) ⇒ S3Backend

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

rubocop:disable Metrics/ParameterLists



97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 97

def initialize(bucket:, prefix:, branch:, default_branch:,
               cache_path:, test_suite_id: nil, local: false, logger: nil)
  validate_required!(bucket: bucket, prefix: prefix, branch: branch,
                     default_branch: default_branch, cache_path: cache_path)

  @bucket = bucket.to_s
  @prefix = trim_trailing_slashes(prefix.to_s)
  @branch = branch.to_s.chomp
  @default_branch = default_branch.to_s.chomp
  @test_suite_id = normalize_test_suite_id(test_suite_id)
  @cache_path = cache_path.to_s
  @cli_binary = local ? 'awslocal' : 'aws'
  @logger = logger
end

Instance Method Details

#branch_refs(branch_name) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Read branch_refs for the given branch. Returns ‘=> ts_epoch` or `{}` when the file is missing / malformed. PR tier only - main branch doesn’t track branch_refs (rewrites not expected on the default branch).



166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 166

def branch_refs(branch_name)
  return {} if blank?(branch_name)

  local_tmp = File.join(@cache_path, ".branch_refs_download_#{Process.pid}.json")
  FileUtils.mkdir_p(@cache_path)

  ok, = aws_cp_silent(s3_url(s3_branch_refs_key(branch_name)), local_tmp)
  return {} unless ok

  parsed = JSON.parse(File.read(local_tmp, encoding: ENCODING))
  parsed.is_a?(Hash) ? parsed.transform_values(&:to_i) : {}
rescue StandardError => e
  log_debug("branch_refs read failed (#{e.class}: #{e.message}); treating as empty")
  {}
ensure
  FileUtils.rm_f(local_tmp) if defined?(local_tmp) && local_tmp
end

#download(ref, tree_sha: nil) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Download the cache for ‘ref` into `cache_path`. Tries the backend’s own tier first; on miss, falls back to the main tier for the same ref. Validates the downloaded ‘last_run.json` via schema_version before declaring success.

When ‘tree_sha` is provided, first consults the tree-SHA secondary index (`<tier>/by_tree/<tree_sha>`) to resolve the tree to a commit ref - catches rebase / revert scenarios where the same tree lives at a different commit hash than the one the caller is asking about. The standard `<tier>/<ref>` lookup is still tried as a fallback when the tree pointer is absent or its resolved ref has no archive.

Returns true on validated success, false on any failure. Cleans up partially-downloaded files on failure so a subsequent fresh load doesn’t see stale data.



129
130
131
132
133
134
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 129

def download(ref, tree_sha: nil)
  return false if ref.nil? || ref.to_s.empty?

  attempts = build_download_attempts(ref, tree_sha)
  attempts.any? { |tier, candidate| try_download_from(tier, candidate) }
end

#prune!(count: nil, duration_seconds: nil, pr_branch_ttl_seconds: nil) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Apply retention policy to the backend’s own tier. Returns the number of refs removed. Never raises on a partial failure; logs and returns the count it managed to delete.

Semantics:

- count N: keep newest N refs, delete older.
- duration_seconds X: delete refs whose last_run.json is
  older than X seconds.
- pr_branch_ttl_seconds X: (PR tier only) if the backend's
  branch has no ref newer than X seconds, delete the entire
  pr/<branch>/ prefix (branch_refs.json included).

Two or more parameters may be set; each applies independently. All nil/0 => no-op.



218
219
220
221
222
223
224
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 218

def prune!(count: nil, duration_seconds: nil, pr_branch_ttl_seconds: nil)
  removed = 0
  removed += prune_by_count!(count) if count&.positive?
  removed += prune_by_duration!(duration_seconds) if duration_seconds&.positive?
  removed += prune_dead_pr_branch!(pr_branch_ttl_seconds) if pr_tier? && pr_branch_ttl_seconds&.positive?
  removed
end

#prune_all!(pr_branch_ttl_seconds: nil) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Cross-tier PR-branch cleanup. Enumerates every PR branch under the configured prefix by listing the ‘pr/` subtree, applies the TTL to each branch, deletes dead branches whole. Returns total refs removed. No-op on nil / non-positive TTL. Never raises (graceful-degradation contract).



231
232
233
234
235
236
237
238
239
240
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 231

def prune_all!(pr_branch_ttl_seconds: nil)
  return 0 unless pr_branch_ttl_seconds&.positive?

  cutoff = Time.now.to_i - pr_branch_ttl_seconds.to_i
  branches = discover_pr_branches
  branches.sum { |branch| maybe_prune_branch(branch, cutoff) }
rescue StandardError => e
  log_warn("prune_all! failed (#{e.class}: #{e.message})")
  0
end

#unbounded_warning(warn_threshold: 500) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Check whether the backend’s own tier has accumulated more than ‘warn_threshold` refs without retention configured. Callable from orchestrator for the “S3 growing unbounded” diagnostic.



245
246
247
248
249
250
251
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 245

def unbounded_warning(warn_threshold: 500)
  refs = list_own_tier_refs
  return nil unless refs.length > warn_threshold

  "rspec-tracer remote cache has #{refs.length} refs in #{own_tier_prefix}; " \
    'configure cache_retention_count or cache_retention_duration to cap growth'
end

#upload(ref, tree_sha: nil) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Upload the local cache to this backend’s own tier under ‘ref`. Packs the 15-file local layout into a single `cache.tar.gz` and uploads that one object. Raises on wire failure. Idempotent.

When ‘tree_sha` is provided, ALSO writes a small pointer file at `<tier>/by_tree/<tree_sha>` containing the commit-SHA. The pointer is consumed by `download(ref, tree_sha: …)` to hit the cache when a different commit (rebase / revert) shares the same tree.

Raises:



145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 145

def upload(ref, tree_sha: nil)
  raise S3BackendError, 'ref is required' if blank?(ref)

  run_id = read_local_run_id
  raise S3BackendError, "no local cache to upload (missing #{LAST_RUN_FILENAME})" if run_id.nil?

  archive_path = tmp_archive_path('upload')
  begin
    Archive.pack(cache_path: @cache_path, run_id: run_id, dest_path: archive_path)
    upload_file(archive_path, s3_archive_key(own_tier_prefix, ref))
    upload_tree_pointer(ref, tree_sha) unless blank?(tree_sha)
    log_debug("uploaded cache for #{ref} to #{own_tier_prefix} (#{File.size(archive_path)} bytes)")
  ensure
    FileUtils.rm_f(archive_path)
  end
end

#write_branch_refs(branch_name, refs) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Persist branch_refs for the given branch. No-op for main-branch writes (main-branch doesn’t use branch_refs). Raises on wire failure for PR tier.



187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 187

def write_branch_refs(branch_name, refs)
  return if blank?(branch_name)
  return if branch_name.to_s.chomp == @default_branch
  return if refs.nil? || refs.empty?

  FileUtils.mkdir_p(@cache_path)
  local_tmp = File.join(@cache_path, ".branch_refs_upload_#{Process.pid}.json")
  File.write(local_tmp, JSON.pretty_generate(refs), encoding: ENCODING)

  ok, _stdout, stderr = aws_cp_silent(local_tmp, s3_url(s3_branch_refs_key(branch_name)))
  raise S3BackendError, "Failed to upload branch_refs for #{branch_name}: #{stderr.chomp}" unless ok

  log_debug("wrote branch_refs for #{branch_name}")
ensure
  FileUtils.rm_f(local_tmp) if defined?(local_tmp) && local_tmp
end