Module: Bitfab::Replay

Defined in:: lib/bitfab/replay.rb

Overview

Replay historical traces through a traced method and create a test run.

Class Method Summary collapse

.build_mock_tree(root) ⇒ Object

Walk the children of a root span tree node depth-first and build a lookup keyed by “#trace_function_key:#span_name:#call_index”.
.execute_item(item, receiver, method_name, test_run_id, input_source_span_id = nil, metrics = {}, input_source_trace_id: nil, mock_strategy: "none", mock_tree: nil, adapt_inputs: nil, adapt_ctx: nil) ⇒ Object

Execute a single replay item: deserialize inputs, call method with replay context.
.extract_server_item_metrics(server_item) ⇒ Object

Pull durationMs / tokens / model from the start-replay server item.
.extract_span_data(span) ⇒ Object

Extract input/output data from an external span’s rawData.
.process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock_strategy, adapt_inputs = nil) ⇒ Object

Process all replay items, optionally in parallel using threads.
.process_single_item(http_client, server_item, receiver, method_name, test_run_id, mock_strategy, adapt_inputs = nil) ⇒ Object

Fetch span data and execute a single replay item.
.run(client, receiver, method_name, trace_function_key:, limit: nil, trace_ids: nil, max_concurrency: 10, code_change_description: nil, code_change_files: nil, experiment_group_id: nil, mock: "none", adapt_inputs: nil) ⇒ Hash

Replay historical traces through a method and create a test run.

Class Method Details

.build_mock_tree(root) ⇒ `Object`

Walk the children of a root span tree node depth-first and build a lookup keyed by “#trace_function_key:#span_name:#call_index”.

The root node itself is excluded — at replay time the runtime root span never queries the mock tree.

The compound (key, name) match disambiguates same-key spans that come from the fluent ‘client.get_function(key).wrap(…)` pattern: every wrapped method shares trace_function_key but differs in span_name. The counter is per-(key, name) pair so repeated same-name calls (including recursion) still order by occurrence. Mirrors the Python and TypeScript SDKs after HVT-2078 — keying by trace_function_key alone caused the wrong historical output for fluent-API span sets.

# File 'lib/bitfab/replay.rb', line 287

def build_mock_tree(root)
  spans = {}
  counters = {}

  walk = lambda do |node|
    key = node["traceFunctionKey"]
    if key && !key.empty?
      name = node["spanName"]
      name = key if name.nil? || name.empty?
      counter_key = "#{key}:#{name}"
      index = counters[counter_key] || 0
      counters[counter_key] = index + 1
      spans["#{counter_key}:#{index}"] = {
        source_span_id: node["sourceSpanId"],
        output: node["output"],
        output_meta: node["outputMeta"]
      }
    end
    (node["children"] || []).each { |child| walk.call(child) }
  end

  (root["children"] || []).each { |child| walk.call(child) }

  spans
end

.execute_item(item, receiver, method_name, test_run_id, input_source_span_id = nil, metrics = {}, input_source_trace_id: nil, mock_strategy: "none", mock_tree: nil, adapt_inputs: nil, adapt_ctx: nil) ⇒ `Object`

Execute a single replay item: deserialize inputs, call method with replay context.

# File 'lib/bitfab/replay.rb', line 348

def execute_item(item, receiver, method_name, test_run_id, input_source_span_id = nil, metrics = {},
  input_source_trace_id: nil, mock_strategy: "none", mock_tree: nil, adapt_inputs: nil, adapt_ctx: nil)
  args, kwargs = Serialize.deserialize_inputs(item)

  fn_result = nil
  fn_error = nil
  sdk_trace_id = SecureRandom.uuid
  # Collects the root span's persistence threads (span uploads + trace
  # completion). Joined below so this item's trace is on the server
  # before run() calls complete_replay — otherwise the server's trace-ID
  # mapping races the uploads and the item's trace_id comes back nil.
  pending_persistence = []

  ReplayContext.with_context(
    test_run_id:,
    input_source_span_id:,
    input_source_trace_id:,
    trace_id: sdk_trace_id,
    mock_tree:,
    mock_strategy:,
    pending_persistence:
  ) do
    # Reshape recorded inputs onto the current signature when an adapter is
    # supplied. Inside the rescue so a raising adapter surfaces on this
    # item's :error instead of crashing the run; args is reported on :input.
    if adapt_inputs
      ctx = adapt_ctx || {trace_id: nil, source_span_id: input_source_span_id}
      args, kwargs = adapt_inputs.call(args, kwargs, ctx)
    end
    fn_result = if kwargs.empty?
      receiver.send(method_name, *args)
    else
      receiver.send(method_name, *args, **kwargs)
    end
  rescue => e
    fn_error = e.message
  end

  # Wait for this item's trace (spans + completion) to be fully persisted
  # before the item resolves. Runs on the error path too — a raising
  # method still emits a root span whose trace must land before
  # complete_replay. Joins are bounded by the HTTP layer's own timeouts.
  pending_persistence.each(&:join)

  {
    input: args,
    result: fn_result,
    original_output: item["output"],
    error: fn_error,
    duration_ms: metrics[:duration_ms],
    tokens: metrics[:tokens],
    model: metrics[:model],
    trace_id: sdk_trace_id
  }
end

.extract_server_item_metrics(server_item) ⇒ `Object`

Pull durationMs / tokens / model from the start-replay server item. Normalizes to symbol-keyed tokens hash and nil-safe defaults so older servers without these fields still produce a consistent shape.

# File 'lib/bitfab/replay.rb', line 329

def extract_server_item_metrics(server_item)
  raw_tokens = server_item["tokens"]
  tokens = if raw_tokens.is_a?(Hash)
    {
      input: raw_tokens["input"],
      output: raw_tokens["output"],
      cached: raw_tokens["cached"],
      total: raw_tokens["total"]
    }
  end

  {
    duration_ms: server_item["durationMs"],
    tokens:,
    model: server_item["model"]
  }
end

.extract_span_data(span) ⇒ `Object`

Extract input/output data from an external span’s rawData.

# File 'lib/bitfab/replay.rb', line 314

def extract_span_data(span)
  raw_data = span["rawData"] || {}
  span_data = raw_data["span_data"] || {}

  {
    "input" => span_data["input"],
    "output" => span_data["output"],
    "inputSerialized" => span_data["input_serialized"],
    "outputSerialized" => span_data["output_serialized"]
  }
end

.process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock_strategy, adapt_inputs = nil) ⇒ `Object`

Process all replay items, optionally in parallel using threads.

# File 'lib/bitfab/replay.rb', line 194

def process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock_strategy,
  adapt_inputs = nil)
  concurrency = max_concurrency || server_items.length

  if concurrency <= 1
    server_items.map do |item|
      process_single_item(http_client, item, receiver, method_name, test_run_id, mock_strategy, adapt_inputs)
    end
  else
    results_mutex = Mutex.new
    results = []
    work_queue = server_items.each_with_index.to_a
    work_mutex = Mutex.new

    workers = [concurrency, server_items.length].min.times.map do
      Thread.new do
        loop do
          item, idx = work_mutex.synchronize { work_queue.shift }
          break unless item

          result = process_single_item(http_client, item, receiver, method_name, test_run_id, mock_strategy,
            adapt_inputs)
          results_mutex.synchronize { results[idx] = result }
        end
      end
    end

    workers.each(&:join)
    results.compact
  end
end

.process_single_item(http_client, server_item, receiver, method_name, test_run_id, mock_strategy, adapt_inputs = nil) ⇒ `Object`

Fetch span data and execute a single replay item.

Any error while fetching the span, building the mock tree, or deserializing inputs is captured on the returned item’s :error rather than propagated, so one bad trace never aborts the whole replay run (mirrors the TypeScript and Python SDKs’ per-item rescue).

# File 'lib/bitfab/replay.rb', line 232

def process_single_item(http_client, server_item, receiver, method_name, test_run_id, mock_strategy,
  adapt_inputs = nil)
  metrics = extract_server_item_metrics(server_item)

  span = http_client.get_external_span(server_item["externalSpanId"])
  item_data = extract_span_data(span)

  mock_tree = nil
  if mock_strategy == "all" || mock_strategy == "marked"
    tree = http_client.get_span_tree(server_item["externalSpanId"])
    mock_tree = build_mock_tree(tree["root"] || {})
  end

  adapt_ctx = {trace_id: server_item["traceId"], source_span_id: server_item["externalSpanId"]}

  execute_item(
    item_data,
    receiver,
    method_name,
    test_run_id,
    span["id"],
    metrics,
    input_source_trace_id: span["externalTraceId"],
    mock_strategy:,
    mock_tree:,
    adapt_inputs:,
    adapt_ctx:
  )
rescue => e
  warn "Bitfab: replay item for span #{server_item["externalSpanId"]} failed before execution: #{e.message}"
  {
    input: [],
    result: nil,
    original_output: nil,
    error: e.message,
    duration_ms: metrics&.dig(:duration_ms),
    tokens: metrics&.dig(:tokens),
    model: metrics&.dig(:model),
    trace_id: nil
  }
end

.run(client, receiver, method_name, trace_function_key:, limit: nil, trace_ids: nil, max_concurrency: 10, code_change_description: nil, code_change_files: nil, experiment_group_id: nil, mock: "none", adapt_inputs: nil) ⇒ `Hash`

Replay historical traces through a method and create a test run.

Fetches the last N traces for the given trace function key, re-runs each through the provided receiver and method, and returns comparison data.

Parameters:

client (Bitfab::Client) —

the Bitfab client instance
receiver (Object, Class) —

an instance for instance methods, or a Class for class methods
method_name (Symbol) —

the method to replay
trace_function_key (String) —

the trace function key for this method
limit (Integer, nil) (defaults to: nil) —

maximum number of traces to replay (default: 5). Mutually exclusive with trace_ids: an explicit ID list already determines how many traces replay, so passing both raises.
trace_ids (Array<String>, nil) (defaults to: nil) —

optional list of trace IDs to replay (max 100)
max_concurrency (Integer, nil) (defaults to: 10) —

max threads for parallel replay (default: 10)
code_change_description (String, nil) (defaults to: nil) —

optional rationale for the code change being tested in this replay (stored on the experiment)
code_change_files (Array<Hash>, nil) (defaults to: nil) —

optional list of edited files, each as { path:, before:, after: } (empty string for new/deleted files)
experiment_group_id (String, nil) (defaults to: nil) —

optional UUID grouping multiple replay runs into a single experiment batch
mock (String) (defaults to: "none") —

mock strategy for child spans: “none” (default), “all”, or “marked”. “all” mocks every child span; “marked” only mocks spans declared with mock_on_replay: true.
adapt_inputs (#call, nil) (defaults to: nil) —

optional hook to reshape recorded inputs onto the method’s current signature when its shape changed after the traces were captured. Receives (args, kwargs, ctx) where ctx is { trace_id:, source_span_id: }, and returns [new_args, new_kwargs]. Runs per item inside the same rescue as the method, so a raising adapter sets that item’s :error rather than crashing the run.

Returns:

(Hash) —

with :items, :test_run_id, :test_run_url

# File 'lib/bitfab/replay.rb', line 85

def run(client, receiver, method_name, trace_function_key:, limit: nil, trace_ids: nil, max_concurrency: 10,
  code_change_description: nil, code_change_files: nil, experiment_group_id: nil, mock: "none",
  adapt_inputs: nil)
  unless MOCK_STRATEGIES.include?(mock.to_s)
    raise ArgumentError, "Invalid mock strategy '#{mock}'. Must be one of: #{MOCK_STRATEGIES.join(", ")}"
  end
  if trace_ids
    raise ArgumentError, "trace_ids must contain at least one trace ID." if trace_ids.empty?
    if trace_ids.length > 100
      raise ArgumentError, "trace_ids supports at most 100 trace IDs per replay (got #{trace_ids.length})."
    end
  end
  if limit && trace_ids
    raise ArgumentError,
      "Pass either limit or trace_ids, not both: an explicit trace ID list already determines how many traces replay."
  end

  http_client = client.instance_variable_get(:@http_client)

  # limit is meaningless with explicit trace_ids (the ID list determines
  # the count), so it's omitted from the request entirely.
  effective_limit = trace_ids ? nil : (limit || 5)

  replay_data = http_client.start_replay(
    trace_function_key,
    effective_limit,
    trace_ids:,
    code_change_description:,
    code_change_files:,
    experiment_group_id:
  )
  test_run_id = replay_data["testRunId"]
  test_run_url = replay_data["testRunUrl"]
  server_items = replay_data["items"] || []

  result_items = if server_items.any?
    process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock.to_s,
      adapt_inputs)
  else
    []
  end

  # Every item joined its own trace-persistence threads (span uploads +
  # completion) in execute_item, so all replay traces are on the server
  # by now — no flush needed, and complete_replay's trace-ID mapping is
  # deterministic. complete_replay failures propagate: a missing mapping
  # means verdicts can't be persisted, which callers must hear about
  # loudly.
  complete_response = http_client.complete_replay(test_run_id)
  trace_id_map = complete_response&.dig("traceIds")

  if trace_id_map.nil?
    # Older servers don't return the mapping. Preserve the legacy
    # nil-trace_id behavior but say why.
    warn "Bitfab: server did not return replay trace IDs; item trace_id " \
      "will be nil (server upgrade required for verdict persistence)"
    result_items.each { |item| item[:trace_id] = nil }
  else
    # Map each item's locally-generated trace ID to the server's trace
    # row ID. A completed item with no mapping means its trace was sent
    # but the server has no record — a nil trace_id blocks verdict
    # persistence and the Studio experiments view downstream, so this
    # must never be silent.
    #
    # Severity splits on scope:
    # - ALL completed items missing: systemic (the replayed method is
    #   not traced, or uploads are wholesale broken). Raise; the run's
    #   results are unusable for persistence.
    # - SOME completed items missing: per-item upload failure (transient
    #   network blip, one oversized payload). Nil those items and warn
    #   loudly, but return the run so callers can persist verdicts for
    #   the items that landed.
    missing = []
    completed_count = 0
    result_items.each do |item|
      next unless item[:trace_id]

      mapped = trace_id_map[item[:trace_id]]
      if item[:error].nil?
        completed_count += 1
        missing << item[:trace_id] if mapped.nil?
      end
      item[:trace_id] = mapped
    end
    if missing.any?
      trace_count = complete_response["traceCount"]
      server_count = trace_count.nil? ? "" : " The server persisted #{trace_count} trace(s) for this run."
      if missing.length == completed_count
        raise "Replay completed but the server has no persisted trace for " \
          "any of the #{completed_count} completed item(s) " \
          "(test_run_id #{test_run_id}).#{server_count} Trace uploads were " \
          "joined, so either the uploads failed or the replayed method is " \
          "not traced (no root span was emitted)."
      end
      warn "Bitfab: server has no persisted trace for #{missing.length} of " \
        "#{completed_count} completed replay item(s) " \
        "(test_run_id #{test_run_id}).#{server_count} Their trace_id is nil " \
        "and verdicts cannot be persisted for them. Missing: #{missing.join(", ")}"
    end
  end

  {
    items: result_items,
    test_run_id:,
    test_run_url: "#{client.service_url}#{test_run_url}"
  }
end

Module: Bitfab::Replay

Overview

Class Method Summary collapse

Class Method Details

.build_mock_tree(root) ⇒ Object

.execute_item(item, receiver, method_name, test_run_id, input_source_span_id = nil, metrics = {}, input_source_trace_id: nil, mock_strategy: "none", mock_tree: nil, adapt_inputs: nil, adapt_ctx: nil) ⇒ Object

.extract_server_item_metrics(server_item) ⇒ Object

.extract_span_data(span) ⇒ Object

.process_items(http_client, server_items, receiver, method_name, test_run_id, max_concurrency, mock_strategy, adapt_inputs = nil) ⇒ Object

.process_single_item(http_client, server_item, receiver, method_name, test_run_id, mock_strategy, adapt_inputs = nil) ⇒ Object

.run(client, receiver, method_name, trace_function_key:, limit: nil, trace_ids: nil, max_concurrency: 10, code_change_description: nil, code_change_files: nil, experiment_group_id: nil, mock: "none", adapt_inputs: nil) ⇒ Hash