promptscrub
Strip PII from LLM prompts. Rehydrate it in responses. Your users see real data. Your LLM provider never does.
Drop-in Faraday middleware for OpenAI, Anthropic, Gemini — and any LLM library built on Faraday (RubyLLM, langchainrb, llm.rb).
Your App PromptScrub LLM API
│ │ │
│ "SSN is 123-45-6789" │ │
│──────────────────────────►│ redact │
│ │ "SSN is <SSN_001>" │
│ │─────────────────────────►│
│ │ │ generate
│ │◄─────────────────────────│
│ │ "Your SSN <SSN_001>..." │
│ │ rehydrate │
│ "Your SSN 123-45-6789..."│ │
│◄──────────────────────────│ │
No infra to deploy. No gateway to operate. Just middleware.
Installation
gem "promptscrub"
Quick start
require "faraday"
require "promptscrub"
conn = Faraday.new("https://api.openai.com") do |f|
f.use PromptScrub::Middleware
f.request :json
f.response :json
f.adapter Faraday.default_adapter
end
# PII is stripped before the request leaves your app.
# Tokens are rehydrated in the response. Transparent to your code.
response = conn.post("/v1/chat/completions", {
model: "gpt-4o",
messages: [{ role: "user", content: "Summarize claim for SSN 234-56-7890, card 4532015112830366" }]
})
With RubyLLM
RubyLLM.configure do |c|
c.faraday do |f|
f.use PromptScrub::Middleware
end
end
Built-in detectors
| Type | Detects | Token example |
|---|---|---|
john.doe+tag@sub-domain.co.uk |
<EMAIL_001> |
|
| SSN | 123-45-6789 (invalid ranges excluded) |
<SSN_001> |
| CARD | 13–19 digit numbers (Luhn-validated) | <CARD_001> |
| PHONE | US numbers in all common formats | <PHONE_001> |
Same value always maps to the same token within a request — so alice@corp.com appearing twice becomes <EMAIL_001> twice.
Configuration
PromptScrub.configure do |config|
# Add a custom detector
config.add_detector(:zip, /\b\d{5}(-\d{4})?\b/)
# Opt out of a built-in
config.disable_detector(:phone)
# Redact only outbound (skip rehydration)
config.scrub_response = false
end
Streaming (SSE)
For streaming responses where your app processes chunks directly, use StreamRehydrator to wrap your callback:
vault = PromptScrub::Vault.new
redactor = PromptScrub::Redactor.new(vault, PromptScrub.configuration.detectors)
rehydrator = PromptScrub::StreamRehydrator.new(vault) do |clean_chunk|
print clean_chunk # user sees real values
end
# Before streaming request:
redacted_prompt = redactor.scrub(user_prompt)
# For each SSE chunk received:
rehydrator.call(raw_chunk)
# After stream ends:
rehydrator.flush
StreamRehydrator buffers partial tokens at chunk boundaries (e.g. <EMAIL_ split across two chunks) and flushes them correctly when the token completes.
How it works
- Redact — on every outgoing request,
Redactorscans the body string with all registered detectors and replaces matches with<TYPE_NNN>tokens. Each unique value gets a stable token stored in a per-requestVault. - Send — the redacted body hits the LLM API. The model never sees real PII.
- Rehydrate — on the response,
Rehydratorscans for token patterns and substitutes original values from the vault. Your application code receives the real data.
The vault is in-memory and scoped to a single request — no persistence, no shared state between requests.
Security notes
- Tokens are not encrypted. The vault lives in your process memory for the duration of a request.
- Detection is regex-based. It will catch well-formed PII; obfuscated or unusual formats may slip through.
- For high-assurance use cases (HIPAA, PCI-DSS), add custom detectors for your specific data patterns and review false-negative rates in your domain.
- promptscrub is client-side middleware. It does not replace network-level controls or data governance policies.
Contributing
Bug reports and pull requests are welcome on GitHub.
License
MIT — see LICENSE.