Class: RSpecTracer::RemoteCache::S3Backend Private
- Inherits:
-
Object
- Object
- RSpecTracer::RemoteCache::S3Backend
- Defined in:
- lib/rspec_tracer/remote_cache/s3_backend.rb
Overview
This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.
S3 implementation of ‘RemoteCache::Backend`. Shells out to the `aws` / `awslocal` CLI for every operation - matches 1.x’s behavior and avoids pulling ‘aws-sdk-s3` into the gem’s runtime deps. Users on 1.x already have ‘aws` on PATH per the documented CI recipe; 2.0 asks nothing new.
Two-tier S3 layout (change from 1.x flat layout; paired with the schema_version bump - one cold run on upgrade). Cache payload is a single ‘cache.tar.gz` per ref (~15 JSON files + last_run.json packed together; ~4-6x smaller on the wire + 1 GET per download instead of 15):
s3://<bucket>/<prefix>/
main/<sha>/[<test_suite_id>/]cache.tar.gz
pr/<branch>/<sha>/[<test_suite_id>/]cache.tar.gz
pr/<branch>/branch_refs.json
Local cache_path layout is unchanged - the archive is a transit boundary only. Users and external tooling continue to see the 15-file disk layout documented in ‘USER_FACING_SURFACE.md` section 6.
Tier is determined from ‘branch` vs `default_branch` at construction. Main-branch builds write to main tier; PR builds write to their branch-scoped pr tier. Download tries the backend’s own tier first, then falls back to main tier for the same ref (catches PRs cherry-picking from main).
Retention (closes issue #20 at the architectural layer, not just with a knob):
- `cache_retention_count N` keeps newest N refs per tier
(main has N refs, each PR branch has N refs).
- `cache_retention_duration_seconds X` prunes refs older than
X seconds in any tier the backend visits.
- `cache_retention_pr_branch_ttl_seconds X` deletes a PR branch
entirely (including its branch_refs.json) when no ref has
been touched in X seconds. Applied at upload time in the
backend's own branch only; cross-branch cleanup is a separate
Rake task.
Graceful-degradation contract:
- `download` returns false and never raises on wire/validation
failure. Partial downloads are cleaned up.
- `upload` raises on wire failure; the Rake task catches.
- `branch_refs` returns `{}` on missing file.
- `prune!` returns count removed, never raises.
S3 shells out via ‘aws` CLI - a single class is the natural unit of composition here. The class is large; splitting would be cosmetic. rubocop:disable Metrics/ClassLength
Defined Under Namespace
Classes: S3BackendError
Constant Summary collapse
- MAIN_TIER =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'main'- PR_TIER =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'pr'- BRANCH_REFS_FILENAME =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'branch_refs.json'- LAST_RUN_FILENAME =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'last_run.json'- CACHE_ARCHIVE_FILENAME =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
Archive::CACHE_FILENAME
- ENCODING =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'UTF-8'- REQUIRED_OPTS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
%i[bucket prefix branch default_branch cache_path].freeze
Instance Method Summary collapse
-
#branch_refs(branch_name) ⇒ Object
private
Read branch_refs for the given branch.
-
#download(ref, tree_sha: nil) ⇒ Object
private
Download the cache for ‘ref` into `cache_path`.
-
#initialize(bucket:, prefix:, branch:, default_branch:, cache_path:, test_suite_id: nil, local: false, logger: nil) ⇒ S3Backend
constructor
private
rubocop:disable Metrics/ParameterLists.
-
#prune!(count: nil, duration_seconds: nil, pr_branch_ttl_seconds: nil) ⇒ Object
private
Apply retention policy to the backend’s own tier.
-
#prune_all!(pr_branch_ttl_seconds: nil) ⇒ Object
private
Cross-tier PR-branch cleanup.
-
#unbounded_warning(warn_threshold: 500) ⇒ Object
private
Check whether the backend’s own tier has accumulated more than ‘warn_threshold` refs without retention configured.
-
#upload(ref, tree_sha: nil) ⇒ Object
private
Upload the local cache to this backend’s own tier under ‘ref`.
-
#write_branch_refs(branch_name, refs) ⇒ Object
private
Persist branch_refs for the given branch.
Constructor Details
#initialize(bucket:, prefix:, branch:, default_branch:, cache_path:, test_suite_id: nil, local: false, logger: nil) ⇒ S3Backend
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
rubocop:disable Metrics/ParameterLists
97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 97 def initialize(bucket:, prefix:, branch:, default_branch:, cache_path:, test_suite_id: nil, local: false, logger: nil) validate_required!(bucket: bucket, prefix: prefix, branch: branch, default_branch: default_branch, cache_path: cache_path) @bucket = bucket.to_s @prefix = trim_trailing_slashes(prefix.to_s) @branch = branch.to_s.chomp @default_branch = default_branch.to_s.chomp @test_suite_id = normalize_test_suite_id(test_suite_id) @cache_path = cache_path.to_s @cli_binary = local ? 'awslocal' : 'aws' @logger = logger end |
Instance Method Details
#branch_refs(branch_name) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Read branch_refs for the given branch. Returns ‘=> ts_epoch` or `{}` when the file is missing / malformed. PR tier only - main branch doesn’t track branch_refs (rewrites not expected on the default branch).
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 166 def branch_refs(branch_name) return {} if blank?(branch_name) local_tmp = File.join(@cache_path, ".branch_refs_download_#{Process.pid}.json") FileUtils.mkdir_p(@cache_path) ok, = aws_cp_silent(s3_url(s3_branch_refs_key(branch_name)), local_tmp) return {} unless ok parsed = JSON.parse(File.read(local_tmp, encoding: ENCODING)) parsed.is_a?(Hash) ? parsed.transform_values(&:to_i) : {} rescue StandardError => e log_debug("branch_refs read failed (#{e.class}: #{e.}); treating as empty") {} ensure FileUtils.rm_f(local_tmp) if defined?(local_tmp) && local_tmp end |
#download(ref, tree_sha: nil) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Download the cache for ‘ref` into `cache_path`. Tries the backend’s own tier first; on miss, falls back to the main tier for the same ref. Validates the downloaded ‘last_run.json` via schema_version before declaring success.
When ‘tree_sha` is provided, first consults the tree-SHA secondary index (`<tier>/by_tree/<tree_sha>`) to resolve the tree to a commit ref - catches rebase / revert scenarios where the same tree lives at a different commit hash than the one the caller is asking about. The standard `<tier>/<ref>` lookup is still tried as a fallback when the tree pointer is absent or its resolved ref has no archive.
Returns true on validated success, false on any failure. Cleans up partially-downloaded files on failure so a subsequent fresh load doesn’t see stale data.
129 130 131 132 133 134 |
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 129 def download(ref, tree_sha: nil) return false if ref.nil? || ref.to_s.empty? attempts = build_download_attempts(ref, tree_sha) attempts.any? { |tier, candidate| try_download_from(tier, candidate) } end |
#prune!(count: nil, duration_seconds: nil, pr_branch_ttl_seconds: nil) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Apply retention policy to the backend’s own tier. Returns the number of refs removed. Never raises on a partial failure; logs and returns the count it managed to delete.
Semantics:
- count N: keep newest N refs, delete older.
- duration_seconds X: delete refs whose last_run.json is
older than X seconds.
- pr_branch_ttl_seconds X: (PR tier only) if the backend's
branch has no ref newer than X seconds, delete the entire
pr/<branch>/ prefix (branch_refs.json included).
Two or more parameters may be set; each applies independently. All nil/0 => no-op.
218 219 220 221 222 223 224 |
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 218 def prune!(count: nil, duration_seconds: nil, pr_branch_ttl_seconds: nil) removed = 0 removed += prune_by_count!(count) if count&.positive? removed += prune_by_duration!(duration_seconds) if duration_seconds&.positive? removed += prune_dead_pr_branch!(pr_branch_ttl_seconds) if pr_tier? && pr_branch_ttl_seconds&.positive? removed end |
#prune_all!(pr_branch_ttl_seconds: nil) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Cross-tier PR-branch cleanup. Enumerates every PR branch under the configured prefix by listing the ‘pr/` subtree, applies the TTL to each branch, deletes dead branches whole. Returns total refs removed. No-op on nil / non-positive TTL. Never raises (graceful-degradation contract).
231 232 233 234 235 236 237 238 239 240 |
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 231 def prune_all!(pr_branch_ttl_seconds: nil) return 0 unless pr_branch_ttl_seconds&.positive? cutoff = Time.now.to_i - pr_branch_ttl_seconds.to_i branches = discover_pr_branches branches.sum { |branch| maybe_prune_branch(branch, cutoff) } rescue StandardError => e log_warn("prune_all! failed (#{e.class}: #{e.})") 0 end |
#unbounded_warning(warn_threshold: 500) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Check whether the backend’s own tier has accumulated more than ‘warn_threshold` refs without retention configured. Callable from orchestrator for the “S3 growing unbounded” diagnostic.
245 246 247 248 249 250 251 |
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 245 def unbounded_warning(warn_threshold: 500) refs = list_own_tier_refs return nil unless refs.length > warn_threshold "rspec-tracer remote cache has #{refs.length} refs in #{own_tier_prefix}; " \ 'configure cache_retention_count or cache_retention_duration to cap growth' end |
#upload(ref, tree_sha: nil) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Upload the local cache to this backend’s own tier under ‘ref`. Packs the 15-file local layout into a single `cache.tar.gz` and uploads that one object. Raises on wire failure. Idempotent.
When ‘tree_sha` is provided, ALSO writes a small pointer file at `<tier>/by_tree/<tree_sha>` containing the commit-SHA. The pointer is consumed by `download(ref, tree_sha: …)` to hit the cache when a different commit (rebase / revert) shares the same tree.
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 145 def upload(ref, tree_sha: nil) raise S3BackendError, 'ref is required' if blank?(ref) run_id = read_local_run_id raise S3BackendError, "no local cache to upload (missing #{LAST_RUN_FILENAME})" if run_id.nil? archive_path = tmp_archive_path('upload') begin Archive.pack(cache_path: @cache_path, run_id: run_id, dest_path: archive_path) upload_file(archive_path, s3_archive_key(own_tier_prefix, ref)) upload_tree_pointer(ref, tree_sha) unless blank?(tree_sha) log_debug("uploaded cache for #{ref} to #{own_tier_prefix} (#{File.size(archive_path)} bytes)") ensure FileUtils.rm_f(archive_path) end end |
#write_branch_refs(branch_name, refs) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Persist branch_refs for the given branch. No-op for main-branch writes (main-branch doesn’t use branch_refs). Raises on wire failure for PR tier.
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
# File 'lib/rspec_tracer/remote_cache/s3_backend.rb', line 187 def write_branch_refs(branch_name, refs) return if blank?(branch_name) return if branch_name.to_s.chomp == @default_branch return if refs.nil? || refs.empty? FileUtils.mkdir_p(@cache_path) local_tmp = File.join(@cache_path, ".branch_refs_upload_#{Process.pid}.json") File.write(local_tmp, JSON.pretty_generate(refs), encoding: ENCODING) ok, _stdout, stderr = aws_cp_silent(local_tmp, s3_url(s3_branch_refs_key(branch_name))) raise S3BackendError, "Failed to upload branch_refs for #{branch_name}: #{stderr.chomp}" unless ok log_debug("wrote branch_refs for #{branch_name}") ensure FileUtils.rm_f(local_tmp) if defined?(local_tmp) && local_tmp end |