Module: Bitfab::Replay

Defined in:: lib/bitfab/replay.rb

Overview

Replay historical traces through a traced method and create a test run.

Class Method Summary collapse

.build_mock_tree(root) ⇒ Object

Walk the children of a root span tree node depth-first and build a lookup keyed by “#trace_function_key:#span_name:#call_index”.
.execute_item(item, receiver, method_name, test_run_id, input_source_span_id = nil, metrics = {}, input_source_trace_id: nil, mock_strategy: "none", mock_tree: nil, adapt_inputs: nil, adapt_ctx: nil, db_branch_lease: nil, source_bitfab_trace_id: nil, db_snapshot_ref: nil) ⇒ Object

Execute a single replay item: deserialize inputs, call method with replay context.
.extract_server_item_metrics(server_item) ⇒ Object

Pull durationMs / model from the start-replay server item.
.extract_span_data(span) ⇒ Object

Extract input/output data from an external span’s rawData.
.normalize_tokens(raw_tokens) ⇒ Object

Normalize a complete-replay tokens hash (string-keyed JSON) into the symbol-keyed shape the replay item exposes.
.process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock_strategy, adapt_inputs = nil, include_db_branch_lease = false, on_progress: nil) ⇒ Object

Process all replay items, optionally in parallel using threads.
.process_single_item(http_client, server_item, receiver, method_name, test_run_id, mock_strategy, adapt_inputs = nil, include_db_branch_lease = false) ⇒ Object

Fetch span data and execute a single replay item.
.release_db_branch_lease(http_client, lease) ⇒ Object

Delete the per-item Neon preview branch.
.run(client, receiver, method_name, trace_function_key:, limit: nil, trace_ids: nil, max_concurrency: 10, code_change_description: nil, code_change_files: nil, experiment_group_id: nil, dataset_id: nil, mock: "none", adapt_inputs: nil, environment: nil, on_progress: nil) ⇒ Hash

Replay historical traces through a method and create a test run.

Class Method Details

.build_mock_tree(root) ⇒ `Object`

Walk the children of a root span tree node depth-first and build a lookup keyed by “#trace_function_key:#span_name:#call_index”.

The root node itself is excluded: at replay time the runtime root span never queries the mock tree.

The compound (key, name) match disambiguates same-key spans that come from the fluent ‘client.get_function(key).wrap(…)` pattern: every wrapped method shares trace_function_key but differs in span_name. The counter is per-(key, name) pair so repeated same-name calls (including recursion) still order by occurrence. Mirrors the Python and TypeScript SDKs after HVT-2078: keying by trace_function_key alone caused the wrong historical output for fluent-API span sets.

# File 'lib/bitfab/replay.rb', line 388

def build_mock_tree(root)
  spans = {}
  counters = {}

  walk = lambda do |node|
    key = node["traceFunctionKey"]
    if key && !key.empty?
      name = node["spanName"]
      name = key if name.nil? || name.empty?
      counter_key = "#{key}:#{name}"
      index = counters[counter_key] || 0
      counters[counter_key] = index + 1
      spans["#{counter_key}:#{index}"] = {
        source_span_id: node["sourceSpanId"],
        output: node["output"],
        output_meta: node["outputMeta"]
      }
    end
    (node["children"] || []).each { |child| walk.call(child) }
  end

  (root["children"] || []).each { |child| walk.call(child) }

  spans
end

.execute_item(item, receiver, method_name, test_run_id, input_source_span_id = nil, metrics = {}, input_source_trace_id: nil, mock_strategy: "none", mock_tree: nil, adapt_inputs: nil, adapt_ctx: nil, db_branch_lease: nil, source_bitfab_trace_id: nil, db_snapshot_ref: nil) ⇒ `Object`

Execute a single replay item: deserialize inputs, call method with replay context.

# File 'lib/bitfab/replay.rb', line 456

def execute_item(item, receiver, method_name, test_run_id, input_source_span_id = nil, metrics = {},
  input_source_trace_id: nil, mock_strategy: "none", mock_tree: nil, adapt_inputs: nil, adapt_ctx: nil,
  db_branch_lease: nil, source_bitfab_trace_id: nil, db_snapshot_ref: nil)
  args, kwargs = Serialize.deserialize_inputs(item)

  fn_result = nil
  fn_error = nil
  sdk_trace_id = SecureRandom.uuid
  # Collects the root span's persistence threads (span uploads + trace
  # completion). Joined below so this item's trace is on the server
  # before run() calls complete_replay: otherwise the server's trace-ID
  # mapping races the uploads and the item's trace_id comes back nil.
  pending_persistence = []

  ReplayContext.with_context(
    test_run_id:,
    input_source_span_id:,
    input_source_trace_id:,
    trace_id: sdk_trace_id,
    mock_tree:,
    mock_strategy:,
    pending_persistence:,
    db_branch_lease:,
    source_bitfab_trace_id:
  ) do
    # Reshape recorded inputs onto the current signature when an adapter is
    # supplied. Inside the rescue so a raising adapter surfaces on this
    # item's :error instead of crashing the run; args is reported on :input.
    if adapt_inputs
      ctx = adapt_ctx || {trace_id: nil, source_span_id: input_source_span_id}
      args, kwargs = adapt_inputs.call(args, kwargs, ctx)
    end
    fn_result = if kwargs.empty?
      receiver.send(method_name, *args)
    else
      receiver.send(method_name, *args, **kwargs)
    end
  rescue => e
    fn_error = e.message
  end

  # Wait for this item's trace (spans + completion) to be fully persisted
  # before the item resolves. Runs on the error path too: a raising
  # method still emits a root span whose trace must land before
  # complete_replay. Joins are bounded by the HTTP layer's own timeouts.
  pending_persistence.each(&:join)

  {
    input: args,
    result: fn_result,
    original_output: item["output"],
    error: fn_error,
    duration_ms: metrics[:duration_ms],
    tokens: metrics[:tokens],
    model: metrics[:model],
    trace_id: sdk_trace_id,
    db_snapshot_ref:
  }
end

.extract_server_item_metrics(server_item) ⇒ `Object`

Pull durationMs / model from the start-replay server item. Nil-safe defaults so older servers without these fields still produce a consistent shape. Tokens are intentionally NOT read from the start item (it carries the ORIGINAL trace’s tokens); the replayed run’s tokens are filled in by run() from the complete-replay response once spans are aggregated server-side, and stay nil here and on older servers.

# File 'lib/bitfab/replay.rb', line 433

def extract_server_item_metrics(server_item)
  {
    duration_ms: server_item["durationMs"],
    tokens: nil,
    model: server_item["model"]
  }
end

.extract_span_data(span) ⇒ `Object`

Extract input/output data from an external span’s rawData.

# File 'lib/bitfab/replay.rb', line 415

def extract_span_data(span)
  raw_data = span["rawData"] || {}
  span_data = raw_data["span_data"] || {}

  {
    "input" => span_data["input"],
    "output" => span_data["output"],
    "inputSerialized" => span_data["input_serialized"],
    "outputSerialized" => span_data["output_serialized"]
  }
end

.normalize_tokens(raw_tokens) ⇒ `Object`

Normalize a complete-replay tokens hash (string-keyed JSON) into the symbol-keyed shape the replay item exposes. Nil when the server reported no token data for this trace.

# File 'lib/bitfab/replay.rb', line 444

def normalize_tokens(raw_tokens)
  return nil unless raw_tokens.is_a?(Hash)

  {
    input: raw_tokens["input"],
    output: raw_tokens["output"],
    cached: raw_tokens["cached"],
    total: raw_tokens["total"]
  }
end

.process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock_strategy, adapt_inputs = nil, include_db_branch_lease = false, on_progress: nil) ⇒ `Object`

Process all replay items, optionally in parallel using threads.

# File 'lib/bitfab/replay.rb', line 243

def process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock_strategy,
  adapt_inputs = nil, include_db_branch_lease = false, on_progress: nil)
  concurrency = max_concurrency || server_items.length

  # Reports running totals once per item as it settles. In the parallel
  # path it runs from worker threads, so the mutex both makes the counter
  # updates safe and serializes the user's callback (never called
  # concurrently). A raising callback is swallowed: progress UI must never
  # crash the run.
  total = server_items.length
  progress_mutex = Mutex.new
  completed = 0
  succeeded = 0
  errored = 0
  report = lambda do |result|
    return unless on_progress

    progress_mutex.synchronize do
      completed += 1
      result[:error].nil? ? (succeeded += 1) : (errored += 1)
      begin
        on_progress.call({completed:, total:, succeeded:, errored:})
      rescue => e
        warn "Bitfab: replay on_progress callback raised: #{e.message}"
      end
    end
  end

  if concurrency <= 1
    server_items.map do |item|
      result = process_single_item(http_client, item, receiver, method_name, test_run_id, mock_strategy,
        adapt_inputs, include_db_branch_lease)
      report.call(result)
      result
    end
  else
    results_mutex = Mutex.new
    results = []
    work_queue = server_items.each_with_index.to_a
    work_mutex = Mutex.new

    workers = [concurrency, server_items.length].min.times.map do
      Thread.new do
        loop do
          item, idx = work_mutex.synchronize { work_queue.shift }
          break unless item

          result = process_single_item(http_client, item, receiver, method_name, test_run_id, mock_strategy,
            adapt_inputs, include_db_branch_lease)
          results_mutex.synchronize { results[idx] = result }
          report.call(result)
        end
      end
    end

    workers.each(&:join)
    results.compact
  end
end

.process_single_item(http_client, server_item, receiver, method_name, test_run_id, mock_strategy, adapt_inputs = nil, include_db_branch_lease = false) ⇒ `Object`

Fetch span data and execute a single replay item.

Any error while fetching the span, building the mock tree, or deserializing inputs is captured on the returned item’s :error rather than propagated, so one bad trace never aborts the whole replay run (mirrors the TypeScript and Python SDKs’ per-item rescue).

# File 'lib/bitfab/replay.rb', line 309

def process_single_item(http_client, server_item, receiver, method_name, test_run_id, mock_strategy,
  adapt_inputs = nil, include_db_branch_lease = false)
  metrics = extract_server_item_metrics(server_item)
  # The server resolves a Neon preview branch per item during /replay/start
  # (only when include_db_branch_lease was sent). Release it in the +ensure+
  # below so any raise (span fetch, mock-tree build, or the replayed
  # method) frees the Neon resource. Items whose source trace had no
  # snapshot ref, or whose resolve failed server-side, arrive without a
  # lease (env.active? is false for those).
  lease = include_db_branch_lease ? server_item["dbBranchLease"] : nil

  span = http_client.get_external_span(server_item["externalSpanId"])
  item_data = extract_span_data(span)

  mock_tree = nil
  if mock_strategy == "all" || mock_strategy == "marked"
    tree = http_client.get_span_tree(server_item["externalSpanId"])
    mock_tree = build_mock_tree(tree["root"] || {})
  end

  adapt_ctx = {trace_id: server_item["traceId"], source_span_id: server_item["externalSpanId"]}

  execute_item(
    item_data,
    receiver,
    method_name,
    test_run_id,
    span["id"],
    metrics,
    input_source_trace_id: span["externalTraceId"],
    mock_strategy:,
    mock_tree:,
    adapt_inputs:,
    adapt_ctx:,
    db_branch_lease: lease,
    source_bitfab_trace_id: server_item["traceId"],
    db_snapshot_ref: server_item["dbSnapshotRef"]
  )
rescue => e
  warn "Bitfab: replay item for span #{server_item["externalSpanId"]} failed before execution: #{e.message}"
  {
    input: [],
    result: nil,
    original_output: nil,
    error: e.message,
    duration_ms: metrics&.dig(:duration_ms),
    tokens: metrics&.dig(:tokens),
    model: metrics&.dig(:model),
    trace_id: nil,
    db_snapshot_ref: server_item["dbSnapshotRef"]
  }
ensure
  release_db_branch_lease(http_client, lease) if lease
end

.release_db_branch_lease(http_client, lease) ⇒ `Object`

Delete the per-item Neon preview branch. Best-effort: a failure is warned but never raised: the server-side TTL janitor reaps orphans.

# File 'lib/bitfab/replay.rb', line 366

def release_db_branch_lease(http_client, lease)
  neon_branch_id = lease["neonBranchId"]
  return unless neon_branch_id

  http_client.release_db_branch_lease(neon_branch_id)
rescue => e
  warn "Bitfab: failed to release DB branch #{neon_branch_id} (TTL janitor will catch it): #{e.message}"
end

.run(client, receiver, method_name, trace_function_key:, limit: nil, trace_ids: nil, max_concurrency: 10, code_change_description: nil, code_change_files: nil, experiment_group_id: nil, dataset_id: nil, mock: "none", adapt_inputs: nil, environment: nil, on_progress: nil) ⇒ `Hash`

Replay historical traces through a method and create a test run.

Fetches the last N traces for the given trace function key, re-runs each through the provided receiver and method, and returns comparison data.

Parameters:

client (Bitfab::Client) —

the Bitfab client instance
receiver (Object, Class) —

an instance for instance methods, or a Class for class methods
method_name (Symbol) —

the method to replay
trace_function_key (String) —

the trace function key for this method
limit (Integer, nil) (defaults to: nil) —

maximum number of traces to replay (default: 5). Ignored when trace_ids is passed (with a warning): an explicit ID list already determines how many traces replay.
trace_ids (Array<String>, nil) (defaults to: nil) —

optional list of trace IDs to replay (max 100)
max_concurrency (Integer, nil) (defaults to: 10) —

max threads for parallel replay (default: 10)
code_change_description (String, nil) (defaults to: nil) —

optional rationale for the code change being tested in this replay (stored on the experiment)
code_change_files (Array<Hash>, nil) (defaults to: nil) —

optional list of edited files, each as { path:, before:, after: } (empty string for new/deleted files)
experiment_group_id (String, nil) (defaults to: nil) —

optional UUID grouping multiple replay runs into a single experiment batch
dataset_id (String, nil) (defaults to: nil) —

optional UUID of the dataset this replay runs against, stored on the resulting experiment for durable attribution
mock (String) (defaults to: "none") —

mock strategy for child spans: “none” (default), “all”, or “marked”. “all” mocks every child span; “marked” only mocks spans declared with mock_on_replay: true.
adapt_inputs (#call, nil) (defaults to: nil) —

optional hook to reshape recorded inputs onto the method’s current signature when its shape changed after the traces were captured. Receives (args, kwargs, ctx) where ctx is { trace_id:, source_span_id: }, and returns [new_args, new_kwargs]. Runs per item inside the same rescue as the method, so a raising adapter sets that item’s :error rather than crashing the run.
on_progress (#call, nil) (defaults to: nil) —

optional callback invoked once per item as it finishes, with a running-totals hash { completed:, total:, succeeded:, errored: }. Use it to render replay progress (e.g. a terminal progress bar). A raising callback never crashes the run.

Returns:

(Hash) —

with :items, :test_run_id, :test_run_url

# File 'lib/bitfab/replay.rb', line 108

def run(client, receiver, method_name, trace_function_key:, limit: nil, trace_ids: nil, max_concurrency: 10,
  code_change_description: nil, code_change_files: nil, experiment_group_id: nil, dataset_id: nil, mock: "none",
  adapt_inputs: nil, environment: nil, on_progress: nil)
  unless MOCK_STRATEGIES.include?(mock.to_s)
    raise ArgumentError, "Invalid mock strategy '#{mock}'. Must be one of: #{MOCK_STRATEGIES.join(", ")}"
  end
  if trace_ids
    raise ArgumentError, "trace_ids must contain at least one trace ID." if trace_ids.empty?
    if trace_ids.length > 100
      raise ArgumentError, "trace_ids supports at most 100 trace IDs per replay (got #{trace_ids.length})."
    end
  end
  if limit && trace_ids
    warn "Bitfab: limit is ignored when trace_ids is passed: the explicit trace ID list already " \
      "determines how many traces replay."
  end

  # Reject a trace_function_key that contradicts the method's declared key.
  # replay() fetches historical traces by trace_function_key but records the
  # replayed spans under the method's own bitfab_span key (via send below),
  # so a mismatch produces an incoherent test run (traces fetched for one
  # function, recorded under another). Only fires when the method's key is
  # introspectable; an untraced method falls through to the persistence
  # check in complete_replay. Mirrors the TypeScript/Python SDKs.
  declared_key = Traceable.trace_function_key_for(receiver, method_name)
  if declared_key && declared_key != trace_function_key
    raise ArgumentError,
      "Method #{method_name} is traced under trace function key '#{declared_key}' but replay was " \
      "called with '#{trace_function_key}'. Pass trace_function_key: '#{declared_key}' to replay it, " \
      "or point at the method traced under '#{trace_function_key}'."
  end

  http_client = client.instance_variable_get(:@http_client)

  # limit is meaningless with explicit trace_ids (the ID list determines
  # the count), so it's omitted from the request entirely.
  effective_limit = trace_ids ? nil : (limit || 5)

  include_db_branch_lease = !environment.nil?

  replay_data = http_client.start_replay(
    trace_function_key,
    effective_limit,
    trace_ids:,
    code_change_description:,
    code_change_files:,
    experiment_group_id:,
    include_db_branch_lease:,
    dataset_id:
  )
  test_run_id = replay_data["testRunId"]
  test_run_url = replay_data["testRunUrl"]
  server_items = replay_data["items"] || []

  result_items = if server_items.any?
    process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock.to_s,
      adapt_inputs, include_db_branch_lease, on_progress:)
  else
    []
  end

  # Every item joined its own trace-persistence threads (span uploads +
  # completion) in execute_item, so all replay traces are on the server
  # by now: no flush needed, and complete_replay's trace-ID mapping is
  # deterministic. complete_replay failures propagate: a missing mapping
  # means verdicts can't be persisted, which callers must hear about
  # loudly.
  complete_response = http_client.complete_replay(test_run_id)
  trace_id_map = complete_response&.dig("traceIds")
  # Per-replay-trace token usage keyed by server trace id: the REPLAYED
  # run's tokens (span-aggregated server-side), used below to fill each
  # item's :tokens.
  replay_tokens = complete_response&.dig("tokens") || {}

  if trace_id_map.nil?
    # Older servers don't return the mapping. Preserve the legacy
    # nil-trace_id behavior but say why.
    warn "Bitfab: server did not return replay trace IDs; item trace_id " \
      "will be nil (server upgrade required for verdict persistence)"
    result_items.each { |item| item[:trace_id] = nil }
  else
    # Map each item's locally-generated trace ID to the server's trace
    # row ID. A completed item with no mapping means its trace was sent
    # but the server has no record: a nil trace_id blocks verdict
    # persistence and the Studio experiments view downstream, so this
    # must never be silent.
    #
    # Severity splits on scope:
    # - ALL completed items missing: systemic (the replayed method is
    #   not traced, or uploads are wholesale broken). Raise; the run's
    #   results are unusable for persistence.
    # - SOME completed items missing: per-item upload failure (transient
    #   network blip, one oversized payload). Nil those items and warn
    #   loudly, but return the run so callers can persist verdicts for
    #   the items that landed.
    missing = []
    completed_count = 0
    result_items.each do |item|
      next unless item[:trace_id]

      mapped = trace_id_map[item[:trace_id]]
      if item[:error].nil?
        completed_count += 1
        missing << item[:trace_id] if mapped.nil?
      end
      # Pull this item's replayed-run tokens by its server trace id, before
      # :trace_id is overwritten with that id below.
      item[:tokens] = normalize_tokens(replay_tokens[mapped]) if mapped
      item[:trace_id] = mapped
    end
    if missing.any?
      trace_count = complete_response["traceCount"]
      server_count = trace_count.nil? ? "" : " The server persisted #{trace_count} trace(s) for this run."
      if missing.length == completed_count
        raise "Replay completed but the server has no persisted trace for " \
          "any of the #{completed_count} completed item(s) " \
          "(test_run_id #{test_run_id}).#{server_count} Trace uploads were " \
          "joined, so either the uploads failed or the replayed method is " \
          "not traced (no root span was emitted)."
      end
      warn "Bitfab: server has no persisted trace for #{missing.length} of " \
        "#{completed_count} completed replay item(s) " \
        "(test_run_id #{test_run_id}).#{server_count} Their trace_id is nil " \
        "and verdicts cannot be persisted for them. Missing: #{missing.join(", ")}"
    end
  end

  {
    items: result_items,
    test_run_id:,
    test_run_url: "#{client.service_url}#{test_run_url}"
  }
end

Module: Bitfab::Replay

Overview

Class Method Summary collapse

Class Method Details

.build_mock_tree(root) ⇒ Object

.execute_item(item, receiver, method_name, test_run_id, input_source_span_id = nil, metrics = {}, input_source_trace_id: nil, mock_strategy: "none", mock_tree: nil, adapt_inputs: nil, adapt_ctx: nil, db_branch_lease: nil, source_bitfab_trace_id: nil, db_snapshot_ref: nil) ⇒ Object

.extract_server_item_metrics(server_item) ⇒ Object

.extract_span_data(span) ⇒ Object

.normalize_tokens(raw_tokens) ⇒ Object

.process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock_strategy, adapt_inputs = nil, include_db_branch_lease = false, on_progress: nil) ⇒ Object

.process_single_item(http_client, server_item, receiver, method_name, test_run_id, mock_strategy, adapt_inputs = nil, include_db_branch_lease = false) ⇒ Object

.release_db_branch_lease(http_client, lease) ⇒ Object

.run(client, receiver, method_name, trace_function_key:, limit: nil, trace_ids: nil, max_concurrency: 10, code_change_description: nil, code_change_files: nil, experiment_group_id: nil, dataset_id: nil, mock: "none", adapt_inputs: nil, environment: nil, on_progress: nil) ⇒ Hash