Class: RSpecTracer::Storage::JsonBackend Private

Inherits:
Object
  • Object
show all
Defined in:
lib/rspec_tracer/storage/json_backend.rb

Overview

This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.

JSON-on-disk storage backend. 1.x shipped this layout without a formal contract; 2.0 treats the FILENAMES list below as the authoritative user-facing surface.

External tooling (CI cache keys, debug scripts, report renderers) may reference these exact filenames, so additions or removals are breaking changes. The shared-examples contract in ‘spec/contracts/storage_backend.rb` enforces the list.

Commit point: ‘last_run.json` is written last via tmp + rename. If any of the 11 per-run files fails to write, `last_run.json` stays pointed at the previous successful run and the partially- written run-id directory is orphaned (harmless; `clear!` reaps it). Readers that see `last_run.json` therefore see a complete snapshot.

Concurrency: an exclusive flock on a sentinel file (‘.rspec_tracer.lock` under cache_path) serializes writers. Readers do not take the lock - `last_run.json`’s atomic rename is their consistency model.

Corruption policy: ‘load_graph` never raises. Missing files, malformed JSON, wrong schema, binary-garbage input all yield `nil` + an info log. This is the invariant the fuzz spec asserts across 1000 iterations.

Encoding: every read and write passes ‘encoding: ’UTF-8’‘. Fixes the `Encoding::InvalidByteSequenceError` that bit the dogfood path when an example title contained a non-ASCII byte on a US-ASCII-defaulted filesystem. rubocop:disable Metrics/ClassLength

Defined Under Namespace

Modules: Merger Classes: FieldReader

Constant Summary collapse

FILENAMES =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

On-disk filenames under the default ‘:json` serializer. This is the user-facing surface documented in USER_FACING_SURFACE.md section 6 - external tooling that walks `rspec_tracer_cache/` relies on exactly these names. The `:msgpack` serializer substitutes `.msgpack.gz` for the `.json` suffix (one file per field on disk); the file stems and per-field semantics do not change. boot_set.json lands at the end of the list - additive w.r.t. 1.x and v2 readers that walked this enumeration. It carries the project’s transitive boot-load set (schema_version 3). wsi_snapshot.json persists the WholeSuiteInvalidators digest_snapshot so warm runs can tell whether Gemfile.lock / .ruby-version / .rspec-tracer / tracer-gem identity changed since the previous run. Without it, warm runs always saw a nil previous and treated every run as a cold first run. Missing file deserializes to ‘{}` so older caches still load - the fallback path fires one full re-run (safe). env_snapshot.json persists the `Tracker::EnvSnapshot` digest map for env-var values the per-example `tracks: { env: … }` DSL declares. Same missing-coerces-to-`{}` fallback as wsi_snapshot - no schema bump. env_dependency.json persists the per-example tracked-env attribution map that reporters need for the Examples Dependency report. Missing file coerces to `{}`; older caches load without a cold re-run.

%w[
  all_examples.json
  duplicate_examples.json
  interrupted_examples.json
  flaky_examples.json
  failed_examples.json
  pending_examples.json
  skipped_examples.json
  all_files.json
  dependency.json
  reverse_dependency.json
  examples_coverage.json
  boot_set.json
  wsi_snapshot.json
  env_snapshot.json
  env_dependency.json
  cache_hit_reason.json
].freeze
LAST_RUN_FILENAME =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

'last_run.json'
LOCK_FILENAME =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

'.rspec_tracer.lock'
ENCODING =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

'UTF-8'
FIELD_NAMES =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Known snapshot field symbols. Derived directly from FIELD_KINDS below (the write-side and read-side shape tables both enumerate the same set, so a divergence would already blow up write paths). Kept as an Array of Symbol so ‘#read_field` can dispatch without constructing a per-serializer filename table; the filename is computed as “#field.#RSpecTracer::Storage::JsonBackend.@serializer@serializer.extension”.

%i[
  all_examples
  duplicate_examples
  interrupted_examples
  flaky_examples
  failed_examples
  pending_examples
  skipped_examples
  all_files
  dependency
  reverse_dependency
  examples_coverage
  boot_set
  wsi_snapshot
  env_snapshot
  env_dependency
  cache_hit_reason
].freeze
ID_SET_FIELDS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Write-side field groups. Each group dispatches to one serializer (Hash pass-through, Set->sorted Array, or the Hash[id => Set<path>] -> Hash[id => Array<path>] flavor shared by dependency + reverse_dependency). Kept data-driven so a schema_version bump adds one entry instead of a new branch. Read-side uses FIELD_KINDS below.

%w[
  interrupted_examples flaky_examples failed_examples pending_examples skipped_examples
].freeze
HASH_FIELDS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

%w[
  all_examples duplicate_examples all_files examples_coverage
  boot_set wsi_snapshot env_snapshot env_dependency cache_hit_reason
].freeze
DEPENDENCY_FIELDS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Internal constant.

%w[dependency reverse_dependency].freeze
FIELD_KINDS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Read-side field -> deserializer-kind map. Drives ‘decode_field` so the lazy reader looks up one shape per field instead of spelling out a case/when that has to stay in sync with FILENAMES. `:symbolized` = Hash whose inner Hash values get symbolized keys (1.x’s all_examples / all_files convention); ‘:dupe_examples` = same but Array-of-inner-Hash; `:id_set` = Array on disk -> Set in memory; `:dependency` = Hash[id => Array] -> Hash[id => Set]; `:plain_hash` = pass-through (examples_coverage, the digest maps, env_dependency).

{
  all_examples: :symbolized,
  all_files: :symbolized,
  duplicate_examples: :dupe_examples,
  interrupted_examples: :id_set,
  flaky_examples: :id_set,
  failed_examples: :id_set,
  pending_examples: :id_set,
  skipped_examples: :id_set,
  dependency: :dependency,
  reverse_dependency: :dependency,
  examples_coverage: :plain_hash,
  boot_set: :plain_hash,
  wsi_snapshot: :plain_hash,
  env_snapshot: :plain_hash,
  env_dependency: :plain_hash,
  cache_hit_reason: :plain_hash
}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(cache_path:, logger: nil, retention_local_count: nil, warn_per_file_mb: nil, warn_total_mb: nil, serializer: :json) ⇒ JsonBackend

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

rubocop:disable Metrics/ParameterLists



204
205
206
207
208
209
210
211
212
213
214
# File 'lib/rspec_tracer/storage/json_backend.rb', line 204

def initialize(cache_path:, logger: nil, retention_local_count: nil,
               warn_per_file_mb: nil, warn_total_mb: nil, serializer: :json)
  # rubocop:enable Metrics/ParameterLists
  @cache_path = File.expand_path(cache_path)
  @logger = logger
  @retention_local_count = retention_local_count
  @warn_per_file_mb = warn_per_file_mb
  @warn_total_mb = warn_total_mb
  @serializer = resolve_serializer(serializer)
  @serializer_name = serializer
end

Instance Attribute Details

#cache_pathObject (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Internal attribute.



201
202
203
# File 'lib/rspec_tracer/storage/json_backend.rb', line 201

def cache_path
  @cache_path
end

#serializerObject (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Internal attribute.



201
202
203
# File 'lib/rspec_tracer/storage/json_backend.rb', line 201

def serializer
  @serializer
end

#serializer_nameObject (readonly)

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Internal attribute.



201
202
203
# File 'lib/rspec_tracer/storage/json_backend.rb', line 201

def serializer_name
  @serializer_name
end

Instance Method Details

#clear!Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Internal method on the tracer pipeline.



349
350
351
352
353
# File 'lib/rspec_tracer/storage/json_backend.rb', line 349

def clear!
  return unless File.directory?(@cache_path)

  FileUtils.rm_rf(@cache_path)
end

#field_filename(field) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Per-serializer on-disk filename for a snapshot field. ‘:json` -> `all_examples.json`; `:msgpack` -> `all_examples.msgpack.gz`. Public so integration specs / reporters can resolve the expected path without reaching into @serializer.



279
280
281
# File 'lib/rspec_tracer/storage/json_backend.rb', line 279

def field_filename(field)
  "#{field}.#{@serializer.extension}"
end

#last_run_idObject

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Internal method on the tracer pipeline.



218
219
220
221
222
223
224
225
226
# File 'lib/rspec_tracer/storage/json_backend.rb', line 218

def last_run_id
  manifest = read_last_run_manifest
  return nil unless manifest.is_a?(Hash)

  run_id = manifest['run_id']
  return nil if run_id.nil? || run_id.to_s.empty?

  run_id
end

#load_graph(schema_version:) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Internal method on the tracer pipeline.



230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
# File 'lib/rspec_tracer/storage/json_backend.rb', line 230

def load_graph(schema_version:)
  manifest = read_last_run_manifest
  return nil unless manifest.is_a?(Hash)

  stored = manifest['schema_version']
  unless Schema.supported?(stored) && stored == schema_version
    info("schema_version mismatch (stored=#{stored.inspect}, expected=#{schema_version}); cold run")
    return nil
  end

  run_id = manifest['run_id']
  return nil if run_id.nil? || run_id.to_s.empty?

  dir = File.join(@cache_path, run_id)
  return nil unless File.directory?(dir)

  LazySnapshot.new(
    schema_version: stored, run_id: run_id,
    reader: FieldReader.new(backend: self, dir: dir)
  )
rescue StandardError => e
  info("failed to load cache: #{e.class}: #{e.message}; cold run")
  nil
end

#merge_from_peers(peer_cache_paths, schema_version:) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Merge per-worker snapshots (written to ‘peer_cache_paths`) into this backend’s top-level cache and persist via ‘save_graph`. Read each peer via `load_graph` so schema + corruption policy (missing files yield nil, malformed JSON logs + returns nil) flows through the same path as a normal load.

No peers / every peer nil -> no-op returns nil. Partial peers merge what’s available; graceful degradation is the entire point of running this at at_exit time.

‘schema_version` is passed through so peers saved under a different schema version are rejected without side effects (same semantics as a warm run under a mismatched cache).



368
369
370
371
372
373
374
375
376
377
378
379
# File 'lib/rspec_tracer/storage/json_backend.rb', line 368

def merge_from_peers(peer_cache_paths, schema_version:)
  peer_snapshots = peer_cache_paths.filter_map do |path|
    self.class.new(cache_path: path, logger: @logger, serializer: @serializer_name)
      .load_graph(schema_version: schema_version)
  end

  return nil if peer_snapshots.empty?

  merged = Merger.call(peer_snapshots, schema_version: schema_version)
  save_graph(merged, schema_version: schema_version)
  merged
end

#prune_run_dirs!(keep:) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Retain the ‘keep` most-recently-modified run-id directories under cache_path and delete older ones. Always preserves the run-id that `last_run.json` points at (deleting it would make the next reader cold-run). Returns the count removed. Never raises - a prune failure is logged at warn level and treated as best-effort cleanup, same graceful-degradation contract the remote cache backends use.

‘keep` nil / non-positive -> no-op. Called automatically from `save_graph` when the backend was constructed with `retention_local_count:`; also exposed via `rake rspec_tracer:cache:gc` for one-off cleanup.



319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
# File 'lib/rspec_tracer/storage/json_backend.rb', line 319

def prune_run_dirs!(keep:)
  return 0 if keep.nil? || keep <= 0
  return 0 unless File.directory?(@cache_path)

  current = last_run_id
  candidates = collect_run_dirs
  return 0 if candidates.empty?

  _keep, pruned = partition_dirs_to_prune(candidates, keep: keep, current: current)
  pruned.each { |path| FileUtils.rm_rf(path) }
  pruned.size
rescue StandardError => e
  @logger&.warn("rspec-tracer cache gc: prune failed (#{e.class}: #{e.message})")
  0
end

#read_field(dir, field) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Read and deserialize one per-run field. Public so ‘FieldReader` (constructed by `load_graph`) can dispatch. Missing file -> same default value the eager read previously produced (Set.new for ID-set fields, {} for hashes) - preserves the “malformed cache loads gracefully” contract.

‘deep_intern` runs before the decode so String dedup happens once per on-disk path / example_id regardless of how many times the value appears in the parsed tree. RAM win on large caches is the whole point of this method; see json_backend_spec.rb “string interning” for the measurable assertion.

Raises:

  • (ArgumentError)


267
268
269
270
271
272
# File 'lib/rspec_tracer/storage/json_backend.rb', line 267

def read_field(dir, field)
  raise ArgumentError, "unknown snapshot field: #{field.inspect}" unless FIELD_KINDS.key?(field)

  raw = read_run_file(dir, field_filename(field))
  decode_field(field, deep_intern(raw))
end

#save_graph(snapshot, schema_version:) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Internal method on the tracer pipeline.

Raises:

  • (ArgumentError)


285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
# File 'lib/rspec_tracer/storage/json_backend.rb', line 285

def save_graph(snapshot, schema_version:)
  raise ArgumentError, 'snapshot must not be nil' if snapshot.nil?

  unless Schema.supported?(schema_version)
    raise ArgumentError, "unsupported schema_version: #{schema_version.inspect}"
  end

  run_id = snapshot.run_id
  raise ArgumentError, 'snapshot.run_id must be a non-empty string' if run_id.nil? || run_id.to_s.empty?

  transactional_save do
    dir = File.join(@cache_path, run_id)
    FileUtils.mkdir_p(dir)
    write_run_files(dir, snapshot)
    write_last_run_atomic(schema_version: schema_version, run_id: run_id)
  end

  maybe_prune_after_save
  maybe_warn_size_budget(run_id)
  snapshot
end

#transactional_save(&block) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Internal method on the tracer pipeline.

Raises:

  • (ArgumentError)


337
338
339
340
341
342
343
344
345
# File 'lib/rspec_tracer/storage/json_backend.rb', line 337

def transactional_save(&block)
  raise ArgumentError, 'block required' unless block

  FileUtils.mkdir_p(@cache_path)
  File.open(lock_path, File::RDWR | File::CREAT, 0o644) do |lock|
    lock.flock(File::LOCK_EX)
    yield
  end
end