Class: RSpecTracer::Storage::JsonBackend Private
- Inherits:
-
Object
- Object
- RSpecTracer::Storage::JsonBackend
- Defined in:
- lib/rspec_tracer/storage/json_backend.rb
Overview
This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.
JSON-on-disk storage backend. 1.x shipped this layout without a formal contract; 2.0 treats the FILENAMES list below as the authoritative user-facing surface.
External tooling (CI cache keys, debug scripts, report renderers) may reference these exact filenames, so additions or removals are breaking changes. The shared-examples contract in ‘spec/contracts/storage_backend.rb` enforces the list.
Commit point: ‘last_run.json` is written last via tmp + rename. If any of the 11 per-run files fails to write, `last_run.json` stays pointed at the previous successful run and the partially- written run-id directory is orphaned (harmless; `clear!` reaps it). Readers that see `last_run.json` therefore see a complete snapshot.
Concurrency: an exclusive flock on a sentinel file (‘.rspec_tracer.lock` under cache_path) serializes writers. Readers do not take the lock - `last_run.json`’s atomic rename is their consistency model.
Corruption policy: ‘load_graph` never raises. Missing files, malformed JSON, wrong schema, binary-garbage input all yield `nil` + an info log. This is the invariant the fuzz spec asserts across 1000 iterations.
Encoding: every read and write passes ‘encoding: ’UTF-8’‘. Fixes the `Encoding::InvalidByteSequenceError` that bit the dogfood path when an example title contained a non-ASCII byte on a US-ASCII-defaulted filesystem. rubocop:disable Metrics/ClassLength
Defined Under Namespace
Modules: Merger Classes: FieldReader
Constant Summary collapse
- FILENAMES =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
On-disk filenames under the default ‘:json` serializer. This is the user-facing surface documented in USER_FACING_SURFACE.md section 6 - external tooling that walks `rspec_tracer_cache/` relies on exactly these names. The `:msgpack` serializer substitutes `.msgpack.gz` for the `.json` suffix (one file per field on disk); the file stems and per-field semantics do not change. boot_set.json lands at the end of the list - additive w.r.t. 1.x and v2 readers that walked this enumeration. It carries the project’s transitive boot-load set (schema_version 3). wsi_snapshot.json persists the WholeSuiteInvalidators digest_snapshot so warm runs can tell whether Gemfile.lock / .ruby-version / .rspec-tracer / tracer-gem identity changed since the previous run. Without it, warm runs always saw a nil previous and treated every run as a cold first run. Missing file deserializes to ‘{}` so older caches still load - the fallback path fires one full re-run (safe). env_snapshot.json persists the `Tracker::EnvSnapshot` digest map for env-var values the per-example `tracks: { env: … }` DSL declares. Same missing-coerces-to-`{}` fallback as wsi_snapshot - no schema bump. env_dependency.json persists the per-example tracked-env attribution map that reporters need for the Examples Dependency report. Missing file coerces to `{}`; older caches load without a cold re-run.
%w[ all_examples.json duplicate_examples.json interrupted_examples.json flaky_examples.json failed_examples.json pending_examples.json skipped_examples.json all_files.json dependency.json reverse_dependency.json examples_coverage.json boot_set.json wsi_snapshot.json env_snapshot.json env_dependency.json cache_hit_reason.json ].freeze
- LAST_RUN_FILENAME =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'last_run.json'- LOCK_FILENAME =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'.rspec_tracer.lock'- ENCODING =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'UTF-8'- FIELD_NAMES =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Known snapshot field symbols. Derived directly from FIELD_KINDS below (the write-side and read-side shape tables both enumerate the same set, so a divergence would already blow up write paths). Kept as an Array of Symbol so ‘#read_field` can dispatch without constructing a per-serializer filename table; the filename is computed as “#field.#RSpecTracer::Storage::JsonBackend.@serializer@serializer.extension”.
%i[ all_examples duplicate_examples interrupted_examples flaky_examples failed_examples pending_examples skipped_examples all_files dependency reverse_dependency examples_coverage boot_set wsi_snapshot env_snapshot env_dependency cache_hit_reason ].freeze
- ID_SET_FIELDS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Write-side field groups. Each group dispatches to one serializer (Hash pass-through, Set->sorted Array, or the Hash[id => Set<path>] -> Hash[id => Array<path>] flavor shared by dependency + reverse_dependency). Kept data-driven so a schema_version bump adds one entry instead of a new branch. Read-side uses FIELD_KINDS below.
%w[ interrupted_examples flaky_examples failed_examples pending_examples skipped_examples ].freeze
- HASH_FIELDS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
%w[ all_examples duplicate_examples all_files examples_coverage boot_set wsi_snapshot env_snapshot env_dependency cache_hit_reason ].freeze
- DEPENDENCY_FIELDS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
%w[dependency reverse_dependency].freeze
- FIELD_KINDS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Read-side field -> deserializer-kind map. Drives ‘decode_field` so the lazy reader looks up one shape per field instead of spelling out a case/when that has to stay in sync with FILENAMES. `:symbolized` = Hash whose inner Hash values get symbolized keys (1.x’s all_examples / all_files convention); ‘:dupe_examples` = same but Array-of-inner-Hash; `:id_set` = Array on disk -> Set in memory; `:dependency` = Hash[id => Array] -> Hash[id => Set]; `:plain_hash` = pass-through (examples_coverage, the digest maps, env_dependency).
{ all_examples: :symbolized, all_files: :symbolized, duplicate_examples: :dupe_examples, interrupted_examples: :id_set, flaky_examples: :id_set, failed_examples: :id_set, pending_examples: :id_set, skipped_examples: :id_set, dependency: :dependency, reverse_dependency: :dependency, examples_coverage: :plain_hash, boot_set: :plain_hash, wsi_snapshot: :plain_hash, env_snapshot: :plain_hash, env_dependency: :plain_hash, cache_hit_reason: :plain_hash }.freeze
Instance Attribute Summary collapse
-
#cache_path ⇒ Object
readonly
private
Internal attribute.
-
#serializer ⇒ Object
readonly
private
Internal attribute.
-
#serializer_name ⇒ Object
readonly
private
Internal attribute.
Instance Method Summary collapse
-
#clear! ⇒ Object
private
Internal method on the tracer pipeline.
-
#field_filename(field) ⇒ Object
private
Per-serializer on-disk filename for a snapshot field.
-
#initialize(cache_path:, logger: nil, retention_local_count: nil, warn_per_file_mb: nil, warn_total_mb: nil, serializer: :json) ⇒ JsonBackend
constructor
private
rubocop:disable Metrics/ParameterLists.
-
#last_run_id ⇒ Object
private
Internal method on the tracer pipeline.
-
#load_graph(schema_version:) ⇒ Object
private
Internal method on the tracer pipeline.
-
#merge_from_peers(peer_cache_paths, schema_version:) ⇒ Object
private
Merge per-worker snapshots (written to ‘peer_cache_paths`) into this backend’s top-level cache and persist via ‘save_graph`.
-
#prune_run_dirs!(keep:) ⇒ Object
private
Retain the ‘keep` most-recently-modified run-id directories under cache_path and delete older ones.
-
#read_field(dir, field) ⇒ Object
private
Read and deserialize one per-run field.
-
#save_graph(snapshot, schema_version:) ⇒ Object
private
Internal method on the tracer pipeline.
-
#transactional_save(&block) ⇒ Object
private
Internal method on the tracer pipeline.
Constructor Details
#initialize(cache_path:, logger: nil, retention_local_count: nil, warn_per_file_mb: nil, warn_total_mb: nil, serializer: :json) ⇒ JsonBackend
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
rubocop:disable Metrics/ParameterLists
204 205 206 207 208 209 210 211 212 213 214 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 204 def initialize(cache_path:, logger: nil, retention_local_count: nil, warn_per_file_mb: nil, warn_total_mb: nil, serializer: :json) # rubocop:enable Metrics/ParameterLists @cache_path = File.(cache_path) @logger = logger @retention_local_count = retention_local_count @warn_per_file_mb = warn_per_file_mb @warn_total_mb = warn_total_mb @serializer = resolve_serializer(serializer) @serializer_name = serializer end |
Instance Attribute Details
#cache_path ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal attribute.
201 202 203 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 201 def cache_path @cache_path end |
#serializer ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal attribute.
201 202 203 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 201 def serializer @serializer end |
#serializer_name ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal attribute.
201 202 203 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 201 def serializer_name @serializer_name end |
Instance Method Details
#clear! ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
349 350 351 352 353 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 349 def clear! return unless File.directory?(@cache_path) FileUtils.rm_rf(@cache_path) end |
#field_filename(field) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Per-serializer on-disk filename for a snapshot field. ‘:json` -> `all_examples.json`; `:msgpack` -> `all_examples.msgpack.gz`. Public so integration specs / reporters can resolve the expected path without reaching into @serializer.
279 280 281 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 279 def field_filename(field) "#{field}.#{@serializer.extension}" end |
#last_run_id ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
218 219 220 221 222 223 224 225 226 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 218 def last_run_id manifest = read_last_run_manifest return nil unless manifest.is_a?(Hash) run_id = manifest['run_id'] return nil if run_id.nil? || run_id.to_s.empty? run_id end |
#load_graph(schema_version:) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 230 def load_graph(schema_version:) manifest = read_last_run_manifest return nil unless manifest.is_a?(Hash) stored = manifest['schema_version'] unless Schema.supported?(stored) && stored == schema_version info("schema_version mismatch (stored=#{stored.inspect}, expected=#{schema_version}); cold run") return nil end run_id = manifest['run_id'] return nil if run_id.nil? || run_id.to_s.empty? dir = File.join(@cache_path, run_id) return nil unless File.directory?(dir) LazySnapshot.new( schema_version: stored, run_id: run_id, reader: FieldReader.new(backend: self, dir: dir) ) rescue StandardError => e info("failed to load cache: #{e.class}: #{e.}; cold run") nil end |
#merge_from_peers(peer_cache_paths, schema_version:) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Merge per-worker snapshots (written to ‘peer_cache_paths`) into this backend’s top-level cache and persist via ‘save_graph`. Read each peer via `load_graph` so schema + corruption policy (missing files yield nil, malformed JSON logs + returns nil) flows through the same path as a normal load.
No peers / every peer nil -> no-op returns nil. Partial peers merge what’s available; graceful degradation is the entire point of running this at at_exit time.
‘schema_version` is passed through so peers saved under a different schema version are rejected without side effects (same semantics as a warm run under a mismatched cache).
368 369 370 371 372 373 374 375 376 377 378 379 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 368 def merge_from_peers(peer_cache_paths, schema_version:) peer_snapshots = peer_cache_paths.filter_map do |path| self.class.new(cache_path: path, logger: @logger, serializer: @serializer_name) .load_graph(schema_version: schema_version) end return nil if peer_snapshots.empty? merged = Merger.call(peer_snapshots, schema_version: schema_version) save_graph(merged, schema_version: schema_version) merged end |
#prune_run_dirs!(keep:) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Retain the ‘keep` most-recently-modified run-id directories under cache_path and delete older ones. Always preserves the run-id that `last_run.json` points at (deleting it would make the next reader cold-run). Returns the count removed. Never raises - a prune failure is logged at warn level and treated as best-effort cleanup, same graceful-degradation contract the remote cache backends use.
‘keep` nil / non-positive -> no-op. Called automatically from `save_graph` when the backend was constructed with `retention_local_count:`; also exposed via `rake rspec_tracer:cache:gc` for one-off cleanup.
319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 319 def prune_run_dirs!(keep:) return 0 if keep.nil? || keep <= 0 return 0 unless File.directory?(@cache_path) current = last_run_id candidates = collect_run_dirs return 0 if candidates.empty? _keep, pruned = partition_dirs_to_prune(candidates, keep: keep, current: current) pruned.each { |path| FileUtils.rm_rf(path) } pruned.size rescue StandardError => e @logger&.warn("rspec-tracer cache gc: prune failed (#{e.class}: #{e.})") 0 end |
#read_field(dir, field) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Read and deserialize one per-run field. Public so ‘FieldReader` (constructed by `load_graph`) can dispatch. Missing file -> same default value the eager read previously produced (Set.new for ID-set fields, {} for hashes) - preserves the “malformed cache loads gracefully” contract.
‘deep_intern` runs before the decode so String dedup happens once per on-disk path / example_id regardless of how many times the value appears in the parsed tree. RAM win on large caches is the whole point of this method; see json_backend_spec.rb “string interning” for the measurable assertion.
267 268 269 270 271 272 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 267 def read_field(dir, field) raise ArgumentError, "unknown snapshot field: #{field.inspect}" unless FIELD_KINDS.key?(field) raw = read_run_file(dir, field_filename(field)) decode_field(field, deep_intern(raw)) end |
#save_graph(snapshot, schema_version:) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 285 def save_graph(snapshot, schema_version:) raise ArgumentError, 'snapshot must not be nil' if snapshot.nil? unless Schema.supported?(schema_version) raise ArgumentError, "unsupported schema_version: #{schema_version.inspect}" end run_id = snapshot.run_id raise ArgumentError, 'snapshot.run_id must be a non-empty string' if run_id.nil? || run_id.to_s.empty? transactional_save do dir = File.join(@cache_path, run_id) FileUtils.mkdir_p(dir) write_run_files(dir, snapshot) write_last_run_atomic(schema_version: schema_version, run_id: run_id) end maybe_prune_after_save maybe_warn_size_budget(run_id) snapshot end |
#transactional_save(&block) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
337 338 339 340 341 342 343 344 345 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 337 def transactional_save(&block) raise ArgumentError, 'block required' unless block FileUtils.mkdir_p(@cache_path) File.open(lock_path, File::RDWR | File::CREAT, 0o644) do |lock| lock.flock(File::LOCK_EX) yield end end |