Class: RSpecTracer::Storage::JsonBackend Private
- Inherits:
-
Object
- Object
- RSpecTracer::Storage::JsonBackend
- Defined in:
- lib/rspec_tracer/storage/json_backend.rb
Overview
This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.
JSON-on-disk storage backend. 1.x shipped this layout without a formal contract; 2.0 treats the FILENAMES list below as the authoritative user-facing surface.
External tooling (CI cache keys, debug scripts, report renderers) may reference these exact filenames, so additions or removals are breaking changes. The shared-examples contract in ‘spec/contracts/storage_backend.rb` enforces the list.
Commit point: ‘last_run.json` is written last via tmp + rename. If any of the 11 per-run files fails to write, `last_run.json` stays pointed at the previous successful run and the partially- written run-id directory is orphaned (harmless; `clear!` reaps it). Readers that see `last_run.json` therefore see a complete snapshot.
Concurrency: an exclusive flock on a sentinel file (‘.rspec_tracer.lock` under cache_path) serializes writers. Readers do not take the lock - `last_run.json`’s atomic rename is their consistency model.
Corruption policy: ‘load_graph` never raises. Missing files, malformed JSON, wrong schema, binary-garbage input all yield `nil` + an info log. This is the invariant the fuzz spec asserts across 1000 iterations.
Encoding: every read and write passes ‘encoding: ’UTF-8’‘. Fixes the `Encoding::InvalidByteSequenceError` that bit the dogfood path when an example title contained a non-ASCII byte on a US-ASCII-defaulted filesystem. rubocop:disable Metrics/ClassLength
Defined Under Namespace
Modules: Merger Classes: FieldReader
Constant Summary collapse
- FILENAMES =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
On-disk filenames under the default ‘:json` serializer. This is the user-facing surface documented in USER_FACING_SURFACE.md section 6 - external tooling that walks `rspec_tracer_cache/` relies on exactly these names. The `:msgpack` serializer substitutes `.msgpack.gz` for the `.json` suffix (one file per field on disk); the file stems and per-field semantics do not change. boot_set.json lands at the end of the list - additive w.r.t. 1.x and v2 readers that walked this enumeration. It carries the project’s transitive boot-load set (schema_version 3). wsi_snapshot.json persists the WholeSuiteInvalidators digest_snapshot so warm runs can tell whether Gemfile.lock / .ruby-version / .rspec-tracer / tracer-gem identity changed since the previous run. Without it, warm runs always saw a nil previous and treated every run as a cold first run. Missing file deserializes to ‘{}` so older caches still load - the fallback path fires one full re-run (safe). env_snapshot.json persists the `Tracker::EnvSnapshot` digest map for env-var values the per-example `tracks: { env: … }` DSL declares. Same missing-coerces-to-`{}` fallback as wsi_snapshot - no schema bump. env_dependency.json persists the per-example tracked-env attribution map that reporters need for the Examples Dependency report. Missing file coerces to `{}`; older caches load without a cold re-run.
%w[ all_examples.json duplicate_examples.json interrupted_examples.json flaky_examples.json failed_examples.json pending_examples.json skipped_examples.json all_files.json dependency.json reverse_dependency.json examples_coverage.json boot_set.json wsi_snapshot.json env_snapshot.json env_dependency.json cache_hit_reason.json filtered_examples.json ].freeze
- LAST_RUN_FILENAME =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'last_run.json'- LOCK_FILENAME =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'.rspec_tracer.lock'- ENCODING =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
'UTF-8'- FIELD_NAMES =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Known snapshot field symbols. Derived directly from FIELD_KINDS below (the write-side and read-side shape tables both enumerate the same set, so a divergence would already blow up write paths). Kept as an Array of Symbol so ‘#read_field` can dispatch without constructing a per-serializer filename table; the filename is computed as “#field.#RSpecTracer::Storage::JsonBackend.@serializer@serializer.extension”.
%i[ all_examples duplicate_examples interrupted_examples flaky_examples failed_examples pending_examples skipped_examples all_files dependency reverse_dependency examples_coverage boot_set wsi_snapshot env_snapshot env_dependency cache_hit_reason filtered_examples ].freeze
- ID_SET_FIELDS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Write-side field groups. Each group dispatches to one serializer (Hash pass-through, Set->sorted Array, or the Hash[id => Set<path>] -> Hash[id => Array<path>] flavor shared by dependency + reverse_dependency). Kept data-driven so a schema_version bump adds one entry instead of a new branch. Read-side uses FIELD_KINDS below.
%w[ interrupted_examples flaky_examples failed_examples pending_examples skipped_examples ].freeze
- HASH_FIELDS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
%w[ all_examples duplicate_examples all_files examples_coverage boot_set wsi_snapshot env_snapshot env_dependency cache_hit_reason filtered_examples ].freeze
- DEPENDENCY_FIELDS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Internal constant.
%w[dependency reverse_dependency].freeze
- FIELD_KINDS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Read-side field -> deserializer-kind map. Drives ‘decode_field` so the lazy reader looks up one shape per field instead of spelling out a case/when that has to stay in sync with FILENAMES. `:symbolized` = Hash whose inner Hash values get symbolized keys (1.x’s all_examples / all_files convention); ‘:dupe_examples` = same but Array-of-inner-Hash; `:id_set` = Array on disk -> Set in memory; `:dependency` = Hash[id => Array] -> Hash[id => Set]; `:plain_hash` = pass-through (examples_coverage, the digest maps, env_dependency).
{ all_examples: :symbolized, all_files: :symbolized, duplicate_examples: :dupe_examples, interrupted_examples: :id_set, flaky_examples: :id_set, failed_examples: :id_set, pending_examples: :id_set, skipped_examples: :id_set, dependency: :dependency, reverse_dependency: :dependency, examples_coverage: :plain_hash, boot_set: :plain_hash, wsi_snapshot: :plain_hash, env_snapshot: :plain_hash, env_dependency: :plain_hash, cache_hit_reason: :plain_hash, filtered_examples: :plain_hash }.freeze
Instance Attribute Summary collapse
-
#cache_path ⇒ Object
readonly
private
Internal attribute.
-
#serializer ⇒ Object
readonly
private
Internal attribute.
-
#serializer_name ⇒ Object
readonly
private
Internal attribute.
Instance Method Summary collapse
-
#clear! ⇒ Object
private
Internal method on the tracer pipeline.
-
#field_filename(field) ⇒ Object
private
Per-serializer on-disk filename for a snapshot field.
-
#initialize(cache_path:, logger: nil, retention_local_count: nil, warn_per_file_mb: nil, warn_total_mb: nil, serializer: :json) ⇒ JsonBackend
constructor
private
rubocop:disable Metrics/ParameterLists.
-
#last_run_id ⇒ Object
private
Internal method on the tracer pipeline.
-
#load_graph(schema_version:) ⇒ Object
private
Internal method on the tracer pipeline.
-
#merge_from_peers(peer_cache_paths, schema_version:) ⇒ Object
private
Merge per-worker snapshots (written to ‘peer_cache_paths`) into this backend’s top-level cache and persist via ‘save_graph`.
-
#prune_run_dirs!(keep:) ⇒ Object
private
Retain the ‘keep` most-recently-modified run-id directories under cache_path and delete older ones.
-
#read_field(dir, field) ⇒ Object
private
Read and deserialize one per-run field.
-
#save_graph(snapshot, schema_version:) ⇒ Object
private
Internal method on the tracer pipeline.
-
#transactional_save(&block) ⇒ Object
private
Internal method on the tracer pipeline.
Constructor Details
#initialize(cache_path:, logger: nil, retention_local_count: nil, warn_per_file_mb: nil, warn_total_mb: nil, serializer: :json) ⇒ JsonBackend
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
rubocop:disable Metrics/ParameterLists
208 209 210 211 212 213 214 215 216 217 218 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 208 def initialize(cache_path:, logger: nil, retention_local_count: nil, warn_per_file_mb: nil, warn_total_mb: nil, serializer: :json) # rubocop:enable Metrics/ParameterLists @cache_path = File.(cache_path) @logger = logger @retention_local_count = retention_local_count @warn_per_file_mb = warn_per_file_mb @warn_total_mb = warn_total_mb @serializer = resolve_serializer(serializer) @serializer_name = serializer end |
Instance Attribute Details
#cache_path ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal attribute.
205 206 207 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 205 def cache_path @cache_path end |
#serializer ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal attribute.
205 206 207 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 205 def serializer @serializer end |
#serializer_name ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal attribute.
205 206 207 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 205 def serializer_name @serializer_name end |
Instance Method Details
#clear! ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
353 354 355 356 357 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 353 def clear! return unless File.directory?(@cache_path) FileUtils.rm_rf(@cache_path) end |
#field_filename(field) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Per-serializer on-disk filename for a snapshot field. ‘:json` -> `all_examples.json`; `:msgpack` -> `all_examples.msgpack.gz`. Public so integration specs / reporters can resolve the expected path without reaching into @serializer.
283 284 285 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 283 def field_filename(field) "#{field}.#{@serializer.extension}" end |
#last_run_id ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
222 223 224 225 226 227 228 229 230 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 222 def last_run_id manifest = read_last_run_manifest return nil unless manifest.is_a?(Hash) run_id = manifest['run_id'] return nil if run_id.nil? || run_id.to_s.empty? run_id end |
#load_graph(schema_version:) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 234 def load_graph(schema_version:) manifest = read_last_run_manifest return nil unless manifest.is_a?(Hash) stored = manifest['schema_version'] unless Schema.supported?(stored) && stored == schema_version info("schema_version mismatch (stored=#{stored.inspect}, expected=#{schema_version}); cold run") return nil end run_id = manifest['run_id'] return nil if run_id.nil? || run_id.to_s.empty? dir = File.join(@cache_path, run_id) return nil unless File.directory?(dir) LazySnapshot.new( schema_version: stored, run_id: run_id, reader: FieldReader.new(backend: self, dir: dir) ) rescue StandardError => e info("failed to load cache: #{e.class}: #{e.}; cold run") nil end |
#merge_from_peers(peer_cache_paths, schema_version:) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Merge per-worker snapshots (written to ‘peer_cache_paths`) into this backend’s top-level cache and persist via ‘save_graph`. Read each peer via `load_graph` so schema + corruption policy (missing files yield nil, malformed JSON logs + returns nil) flows through the same path as a normal load.
No peers / every peer nil -> no-op returns nil. Partial peers merge what’s available; graceful degradation is the entire point of running this at at_exit time.
‘schema_version` is passed through so peers saved under a different schema version are rejected without side effects (same semantics as a warm run under a mismatched cache).
372 373 374 375 376 377 378 379 380 381 382 383 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 372 def merge_from_peers(peer_cache_paths, schema_version:) peer_snapshots = peer_cache_paths.filter_map do |path| self.class.new(cache_path: path, logger: @logger, serializer: @serializer_name) .load_graph(schema_version: schema_version) end return nil if peer_snapshots.empty? merged = Merger.call(peer_snapshots, schema_version: schema_version) save_graph(merged, schema_version: schema_version) merged end |
#prune_run_dirs!(keep:) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Retain the ‘keep` most-recently-modified run-id directories under cache_path and delete older ones. Always preserves the run-id that `last_run.json` points at (deleting it would make the next reader cold-run). Returns the count removed. Never raises - a prune failure is logged at warn level and treated as best-effort cleanup, same graceful-degradation contract the remote cache backends use.
‘keep` nil / non-positive -> no-op. Called automatically from `save_graph` when the backend was constructed with `retention_local_count:`; also exposed via `rake rspec_tracer:cache:gc` for one-off cleanup.
323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 323 def prune_run_dirs!(keep:) return 0 if keep.nil? || keep <= 0 return 0 unless File.directory?(@cache_path) current = last_run_id candidates = collect_run_dirs return 0 if candidates.empty? _keep, pruned = partition_dirs_to_prune(candidates, keep: keep, current: current) pruned.each { |path| FileUtils.rm_rf(path) } pruned.size rescue StandardError => e @logger&.warn("rspec-tracer cache gc: prune failed (#{e.class}: #{e.})") 0 end |
#read_field(dir, field) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Read and deserialize one per-run field. Public so ‘FieldReader` (constructed by `load_graph`) can dispatch. Missing file -> same default value the eager read previously produced (Set.new for ID-set fields, {} for hashes) - preserves the “malformed cache loads gracefully” contract.
‘deep_intern` runs before the decode so String dedup happens once per on-disk path / example_id regardless of how many times the value appears in the parsed tree. RAM win on large caches is the whole point of this method; see json_backend_spec.rb “string interning” for the measurable assertion.
271 272 273 274 275 276 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 271 def read_field(dir, field) raise ArgumentError, "unknown snapshot field: #{field.inspect}" unless FIELD_KINDS.key?(field) raw = read_run_file(dir, field_filename(field)) decode_field(field, deep_intern(raw)) end |
#save_graph(snapshot, schema_version:) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 289 def save_graph(snapshot, schema_version:) raise ArgumentError, 'snapshot must not be nil' if snapshot.nil? unless Schema.supported?(schema_version) raise ArgumentError, "unsupported schema_version: #{schema_version.inspect}" end run_id = snapshot.run_id raise ArgumentError, 'snapshot.run_id must be a non-empty string' if run_id.nil? || run_id.to_s.empty? transactional_save do dir = File.join(@cache_path, run_id) FileUtils.mkdir_p(dir) write_run_files(dir, snapshot) write_last_run_atomic(schema_version: schema_version, run_id: run_id) end maybe_prune_after_save maybe_warn_size_budget(run_id) snapshot end |
#transactional_save(&block) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Internal method on the tracer pipeline.
341 342 343 344 345 346 347 348 349 |
# File 'lib/rspec_tracer/storage/json_backend.rb', line 341 def transactional_save(&block) raise ArgumentError, 'block required' unless block FileUtils.mkdir_p(@cache_path) File.open(lock_path, File::RDWR | File::CREAT, 0o644) do |lock| lock.flock(File::LOCK_EX) yield end end |