Deja

Deja is a testing toolkit for code that calls LLM apis. Your tests make real calls to real LLMs, so you can confirm the model actually gives the results you expect — you're checking genuine model behavior, not a stub you wrote. To keep that fast and repeatable, Deja records each real call once and replays it from a cache on every later run, so your suite stays deterministic and free to run in CI.

Overview

What Deja allows

Deja allows you to add the following coverage to your test.

I have application code that generates arguments for an LLM api. I want to assert on the arguments that were provided to the LLM api.
When I pass certain arguments (i.e. a certain prompt) to an LLM, the result is non-deterministic. Even so, I have certain requirements as to what the result should be. I want to assert that the LLM's response meets those requirements.

With this functionality, you can do the following.

I want to change my code and be sure that the changes I made did not affect the arguments passed to the LLM api.
I want to iterate on my application code in ways that will change a prompt sent to an LLM until the response meets certain requirements.
I want to change my code in a way that will change a prompt sent to an LLM and be sure that the response still meets existing requirements.
I want to upgrade to a new model and be sure that all of my existing calls still meet existing requirements.

How Deja works

You run a test locally with ALLOW_LLM_CALL=1.
- When your test hits application code that triggers a call to an LLM api, the call is actually made via http.
- You assert on the arguments that were sent to the LLM api
- You assert in a fuzzy way on the response, like "The response should say that..."
- Deja caches the response, keyed off of the exact set of arguments. The cached response is stored in a generated file, which you store in version control.
You run the test again
- When the LLM api call is triggered, the test finds the response in the cache, skipping the http call to the LLM api
- Your assertions ensure that your code still sent the expected arguments
You push your code and tests run on CI
- Since the cached response is stored in version control, CI has access to it and runs the test without making any actual LLM calls
You update code and re-run the test locally with ALLOW_LLM_CALL=1.
- The updated code can change the prompt, the LLM model, or anything else that will change the arguments sent to the LLM api.
- Since there is no cached response for the new arguments, the call to the LLM api is actually made via http.
- The new response is cached, replacing the old one
- Your fuzzy assertion ensures that the new response still matches your requirements.

LLM support

Today Deja targets the Anthropic SDK. Support for other SDKs is coming.

The Anthropic-specific bits — the response value objects and the serialize/deserialize that backs the cache — live in a separate gem, llm_mock_anthropic (built on the shared llm_mock contract). Deja pulls it in automatically. If a test stubs the Anthropic client directly (rather than recording/replaying), use that gem's builders — e.g. LlmMock::Anthropic.message([...]) — to construct canned responses.

Usage

Installation

# Gemfile
group :test do
  gem "deja"
end

Setup

Require the RSpec integration, point Deja at a cache directory, and register a provider — telling it how to swap your app's client for Deja's caching stub:

# spec/support/deja.rb (or spec/rails_helper.rb)
require "deja/rspec"

Deja.configure do |c|
  # Whatever your app calls to get an Anthropic client. Deja hands you its
  # caching stub; you return it from that accessor for the duration of the test.
  c.register :anthropic,
    install: ->(mock_anthropic_client) { allow(AnthropicClient).to receive(:client).and_return(mock_anthropic_client) }

  # Required only if you use the `meet_requirements` matcher: the client Deja
  # uses to judge a value against its requirements. Deja picks provider-specific
  # defaults from the client's type (Anthropic is built in).
  c.judge_client { Anthropic::Client.new }

  # Optional: override the judge's defaults (model, max_tokens, system prompt) or
  # pass provider-specific args. These are merged into the judge's
  # messages.create call over its built-in defaults.
  c.judge_attrs = { model: "claude-sonnet-4-5" }
end

Recorded cache files go under spec/support/deja_cache by default. To put them somewhere else, set cache_root (a String or Pathname, resolved under your project root):

Deja.configure { |c| c.cache_root = Rails.root.join("spec/support/cache") }

That assumes your app funnels LLM access through a single seam. e.g., In this example, Deja will mock out calls to AnthropicClient.client:

class AnthropicClient
  def self.client
    Anthropic::Client.new
  end
end

Optional: Deja doesn't require WebMock — it intercepts calls at the client seam, not at the HTTP layer. But if your suite already uses WebMock, allow the Anthropic host so recording can reach it (and keep the allowlist tight so a forgotten stub surfaces as a blocked request rather than a silent live call):

WebMock.disable_net_connect!(allow_localhost: true, allow: ["api.anthropic.com"])

Assert on LLM api arguments and response

it "summarizes an article" do
  use_llm_cache("2026-04-30_17-03") # one cache file for this test

  summary = ArticleSummarizer.new(article).call # makes LLM calls — routed through Deja

  kwargs = expect_llm_called          # exactly one call happened
  expect(kwargs[:system]).to include("You are a summarization assistant")

  expect(summary).to meet_requirements(<<~REQ)
    A single sentence under 200 characters that indicates that the article is about The Hitchhiker's Guide to the Galaxy
  REQ
end

Run it three ways:

# 1. First run — nothing cached yet:
bundle exec rspec spec/integration/article_summarizer_spec.rb
#    => Deja::MissingCacheError: "Set ALLOW_LLM_CALL=1 to make the call and record it."

# 2. Record — makes the real calls and writes YAML fixtures:
ALLOW_LLM_CALL=1 bundle exec rspec spec/integration/article_summarizer_spec.rb

# 3. Every run after — replays from cache, no network:
bundle exec rspec spec/integration/article_summarizer_spec.rb

Commit the YAML files under cache_root. They're the recorded fixtures; CI replays them with no API key.

Use LLM response in a subsequent assertion

When your code acts on what the model returned, you'll often want to assert it used that output correctly. But the output is non-deterministic, so you can't hardcode it. cached_llm_value reads the actual recorded response out of the cache file by walking keys and array indices — so your assertion and the recording stay in sync every time you re-record.

it "stores the topics the model extracted" do
  use_llm_cache("2026-05-01_09-15")

  # Asks the model to extract topics via a tool call, then saves them.
  ArticleTagger.new(article).call

  # Read what the model actually returned (a tool_use input) from the cache and
  # assert your code persisted exactly that — no hardcoded expectation.
  topics = cached_llm_value("2026-05-01_09-15",
    "calls", 0, "response", "tool_uses", 0, "input", "topics")

  expect(Article.last.topics).to eq(topics)
end

The path mirrors the recorded YAML (see How it caches): calls → the first call → its response → the first tool_uses entry → that tool call's input → the topics key.

DSL reference

Helper	What it does
`use_llm_cache(id)`	Installs the caching stub and sets the per-test cache id. Call once at the top of an example.
`expect_llm_called`	Asserts exactly one LLM call happened; returns its kwargs. Currently only useful where there is a single llm call in a test.
`forbid_llm_calls`	Installs a client that raises on any access — proves a code path never reaches the LLM.
`cached_llm_value(id, *path)`	Reads a value out of a recorded YAML file by walking keys/indices.
`meet_requirements(text)`	Matcher: asserts a value satisfies free-text requirements (judged once, cached).

How it caches

One YAML file per test, keyed by the id you pass to use_llm_cache:

<cache_root>/cached_calls/<spec/path>/<id>.yaml        # recorded responses
<cache_root>/meets_requirements/<spec/path>/<id>.yaml  # confirmed meet_requirements values

Each request is fingerprinted with a 12-char hash of its canonicalized kwargs. On replay, a miss prints a unified diff against the closest recorded request so you can see exactly what drifted. Re-recording (ALLOW_LLM_CALL=1) prunes any cached entry the test no longer reaches.

Environment variables

Variable	Effect
`ALLOW_LLM_CALL=1`	Make real calls and record/update the cache. Your real client must be able to authenticate (the Anthropic SDK reads `ANTHROPIC_API_KEY` by default).
`DISABLE_LLM_CACHE=1`	Bypass the cache entirely and always call live (debugging).

Configuration

Setting	Default	Purpose
`cache_root`	`spec/support/deja_cache`	Directory for recorded YAML (under `project_root`).
`register(provider, install:, real_client:, as:)`	— (≥1 required)	Register a provider. `install` swaps your app's client for Deja's stub; `real_client` (optional) builds a live client for recording.
`project_root`	`Dir.pwd`	Base for relative paths in error messages.
`judge_client { ... }`	— (required for `meet_requirements`)	Live client used by the `meet_requirements` judge. No default.
`judge_attrs = { ... }`	`{}`	Attrs merged into the `meet_requirements` judge's `messages.create` call, overriding the judge's own defaults (model, token limit, system prompt). `messages` and `output_config` are reserved by the matcher.

License

MIT — see LICENSE.