activestorage-aws-record
Run Active Storage on Amazon DynamoDB — via aws-record
instead of Active Record.
It is a metadata backend: blob bytes still flow through a normal Active
Storage Service (Disk, S3, Mirror, …); only the Blob, Attachment, and
VariantRecord metadata lives in DynamoDB. Everything else — analyzers,
previewers, variants, jobs, direct uploads, controllers — is reused from Active
Storage unchanged.
It implements Active Storage's generic (non-ActiveRecord) custom-backend contract, and doubles as a reference example implementation of that contract.
Why
Apps built on DynamoDB + aws-record (no relational database) still want
Active Storage's attachment ergonomics. This gem provides them without
introducing Active Record, following Single Table Design: every Active
Storage item lives in one table you already own.
Highlights
- Single Table Design — Blob, Attachment, and VariantRecord items share one
application-provided table, keyed by
#-separated composite keys under a configurable namespace. No gem-owned tables. - Zero-config key discovery — the partition/sort key attribute names and types are auto-detected from the live table at boot. Usually the only thing you configure is the table name.
- Adapts to your key types — a String range key needs no GSI (all reads strongly consistent); a numeric range key is supported automatically by routing the adjacency through a string-keyed GSI.
- Safe shared blobs — a strongly-consistent, transactional reference count with a conditional-delete foreign-key guard (no wrongful purge of a shared blob, no zombie/orphan rows).
- Atomic multi-attachment changes — clearing, replacing, or detaching a
has_manycommits every row delete (and a coalesced refcount decrement per blob) in one DynamoDB transaction, so it can never delete some rows and leave others. A change over DynamoDB's 100-action limit fails closed rather than partially. - Fiber-safe — Falcon-ready: eager mutex, mutex-guarded client, read-only post-boot schema cache.
Requirements
- Ruby
>= 3.4 - Active Storage with generic custom-backend support — currently the
rails/rails#57537branch (not yet released). Until it ships in a Rails release, point yourGemfileat that branch; installing against a releasedactivestoragewill not work. aws-record ~> 2.15,aws-sdk-dynamodb ~> 1- A DynamoDB table with a composite primary key (partition String + sort key)
Installation
# Gemfile
gem "activestorage-aws-record"
Configuration
In a Rails app, configure via config.activestorage_aws_record (e.g. in
config/application.rb or an initializer):
config.activestorage_aws_record.table_name = "my_app" # the single table (required)
config.activestorage_aws_record.namespace = "ActiveStorage" # key prefix; change to avoid collisions
# Optional:
# config.activestorage_aws_record.separator = "#"
# config.activestorage_aws_record.client_options = { region: "eu-central-1" }
# config.activestorage_aws_record.client = Aws::DynamoDB::Client.new(...)
# config.activestorage_aws_record.manage_table = Rails.env.local? # create the table if missing (dev/test)
# config.activestorage_aws_record.index_name = "active_storage_index" # which GSI to use in Mode B
| Setting | Default | Meaning |
|---|---|---|
table_name |
"active_storage" |
The single shared table. |
namespace |
"ActiveStorage" |
First segment of every key; isolates Active Storage items from your own. |
separator |
"#" |
Key segment delimiter. |
client / client_options |
— / {} |
Provide a client, or options for one. |
manage_table |
false |
Create the table (and, in dev, point at it) if missing. Production tables are app-managed. |
index_name |
"active_storage_index" |
The GSI to use when the range key is numeric (Mode B). |
The partition/sort key attribute names are not configured — they are detected from the table.
The table
The gem stores its items in your existing single table. It only assumes a composite primary key: a String partition key and a sort key. The sort key's type selects the mode automatically.
Mode A — String sort key (recommended)
All access patterns live on the base table, so every read is strongly consistent and no GSI is required. A minimal standalone table:
client.create_table(
table_name: "my_app",
attribute_definitions: [
{ attribute_name: "pk", attribute_type: "S" },
{ attribute_name: "sk", attribute_type: "S" }
],
key_schema: [
{ attribute_name: "pk", key_type: "HASH" },
{ attribute_name: "sk", key_type: "RANGE" }
],
billing_mode: "PAY_PER_REQUEST"
)
(In development/test, manage_table = true creates exactly this for you.)
Mode B — numeric sort key (e.g. a single-table app keyed hash_key + version)
A numeric sort key cannot hold the #-composite strings, so the gem routes
listing through a string-keyed GSI (auto-detected; its key names can be
anything). Point lookups, the reference count, and the foreign-key guard remain
strong on the base table; listing (an owner's attachments, a blob's variant
records) is eventually consistent via the GSI. Your table needs a GSI named
by index_name with a (String, String) key projecting ALL:
global_secondary_indexes: [{
index_name: "active_storage_index",
key_schema: [
{ attribute_name: "as_index_pk", key_type: "HASH" },
{ attribute_name: "as_index_sk", key_type: "RANGE" }
],
projection: { projection_type: "ALL" }
}]
If the range key is numeric and that GSI is missing, the gem raises a
ConfigurationError describing exactly what to add (it never mutates a
production table's indexes itself).
Key layout
ns = namespace. Items are distinguished by key values, not separate tables:
| Entity | partition | sort |
|---|---|---|
| Blob | ns#Blob#<id> |
ns#Blob#<id> |
| VariantRecord | ns#Blob#<blob_id> |
ns#VariantRecord#<digest> |
| Attachment | ns#Owner#<record_type>#<record_id> |
ns#Attachment#<name>#<id> |
All non-key attributes are stored under namespaced names (as_filename,
as_blob_id, …) so they never collide with your table's own key attributes.
Usage
Greenfield model — Owner
For an aws-record model with no persistence of its own, include Owner. It
provides an aws-record save/destroy wired into Active Storage's callbacks, plus
all the contract glue. Include Aws::Record before Owner:
class User
include Aws::Record
include ActiveStorage::AwsRecord::Owner
string_attr :id, hash_key: true
string_attr :name
has_one_attached :avatar
has_many_attached :documents
end
Model with its own persistence — Attachable
If your model already defines save/destroy (its own versioning, events,
search — e.g. a shared BaseModel), do not include Owner — it would
override that persistence. Include Attachable instead: it adds only the contract
glue (callback chains, a changed? bridge, owner resolution, the attachment
macros) and never touches your save/destroy. Your persistence just needs to
run Active Storage's callback chains — wrap it with the provided helpers:
class Document < BaseModel # BaseModel already defines save/destroy
include ActiveStorage::AwsRecord::Attachable
has_many_attached :files
def save(*) = { super }
def destroy(*) = { delete! if persisted? }
end
Active Storage resolves an owner from the bare id it stores; since aws-record's
#find is key-hash-based, Attachable supplies an active_storage_find(id)
adapter for that — without shadowing your model's own #find. Owners must be
single-hash-key. If you define :commit callbacks, you must run them (Active
Storage moves uploads to after_commit once they exist); with none, uploads
happen in after_save.
Then use Active Storage exactly as you would with the Active Record backend:
user.avatar.attach(io: file, filename: "me.png", content_type: "image/png")
user.avatar.attached? # => true
user.avatar.url # served by your configured Service
user.documents.attach(blob1, blob2)
user.avatar.variant(resize_to_limit: [100, 100]).processed
Owners keep their own key schema — the single-table key scheme above applies only to the gem's three entities.
Consistency
- Mode A: every read is strongly consistent (base table only).
- Mode B: point lookups, the reference count, and the foreign-key guard are strong; listing (owner→attachments, blob→variant sweep) is eventually consistent (GSI). Active Storage's in-memory change tracking masks this within a request.
Atomic grouped changes
Active Storage drives a has_many clear/replace/detach through
attachment_class.transaction. The adapter makes that a real, fiber-local
DynamoDB transaction: every row delete and a single coalesced ADD per blob
(so the same blob attached twice produces one -2, not two rejected ops) commit
in one transact_write_items. Either all rows go or none do — no partial clear.
Creates stay per-row and synchronous, so Active Storage's own failed-save
cleanup of new blob/attachment records is unaffected. A single buffered delete
keeps the per-row idempotent recovery (duplicate purge, orphaned blob).
Known limitations
Blob#attachments(the blob→attachment reverse lookup) is unsupported on a persisted blob — the design is deliberately GSI-free for that direction; the shared-blob count answers "is this still referenced?". Active Storage's generic path only reaches it for a non-persisted blob (which has no rows).- Mode B variant cleanup uses the eventually-consistent GSI, so a variant created within the GSI's propagation window of a blob purge may be missed (Mode A is strong).
- Two requests processing the same variant concurrently can briefly see the variant record before its image is uploaded — a transient that resolves once the winner finishes (inherent to create-then-upload).
- Blob service keys are random 145-bit tokens; like the reference backend there is no DB-level unique-key constraint (collision is negligible).
- An atomic grouped change is capped by DynamoDB's 100-action transaction
limit (
#rows + #distinct blobs). A larger clear/replace/detach raisesActiveStorage::AwsRecord::TransactionTooLargebefore any write rather than chunking (which would reintroduce the partial-clear bug) — split it into smaller batches. - A
has_onereplace is a synchronous create plus one buffered orphan delete. It is fully atomic only on an Active Storage that carries the widenedCreateOne#saverescue (wrapping the wholeattachment_class.transaction, so a commit-time delete failure rolls back the new record) — shipped together with this adapter as part of the generic-backend work. Without it, a transient failure on the old-blob delete can leave the new attachment plus an unpurged old blob.
Development
bin/setup # bundle install + start DynamoDB Local (docker compose)
bundle exec rake test # Minitest suite (Mode A)
bundle exec rake smoke # standalone end-to-end smoke scripts (Mode A, Mode B, schema discovery)
bundle exec rake rubocop # lint (Rails Omakase); `rake rubocop:autocorrect` to format
bundle exec rake build # package the gem into pkg/
The suite talks to DynamoDB Local over DYNAMODB_ENDPOINT (default
http://localhost:8000). To use a different instance — e.g. to avoid clashing
with another DynamoDB Local already on :8000:
DYNAMODB_PORT=8002 docker compose up -d
DYNAMODB_ENDPOINT=http://localhost:8002 bundle exec rake test
It isolates itself with a per-process table name and never touches other tables on the endpoint.
See PLAN.md for the full design rationale.
License
MIT.