archive_storage
Zero-downtime archival storage for CarrierWave uploads.
archive_storage moves older uploaded files from one storage backend to another, keeps a registry of the current file location, and routes reads to the right backend. It currently integrates with CarrierWave; support for other uploader libraries can be added later without changing the registry model.
Supported storage adapters:
- S3-compatible object storage, including MinIO and AWS S3
- filesystem/NFS
- memory adapter for tests
Typical use cases:
mainS3/MinIO bucket ->archive_001cold bucketarchive_001->archive_002when the first archive fills up- NFS/local disk -> S3-compatible archive storage
Features
- model-first DSL:
archive_storage_for :file - automatic CarrierWave storage wiring
- ActiveRecord registry table:
archive_storage_files - dry-run planning
- scheduled enqueueing
- background migration jobs
- copy, verify, read switch, fallback read, delayed source cleanup
- optional CarrierWave versions/thumbs migration
- GoodJob, ActiveJob, Sidekiq,
sidekiq-cron, andsidekiq-schedulersupport
Installation
Add the gem:
gem "archive_storage"
For S3-compatible storage:
gem "aws-sdk-s3"
Install the registry table:
bin/rails generate archive_storage:install
bin/rails db:migrate
Configuration
Define the storage backends and scheduled archive jobs.
# config/initializers/archive_storage.rb
ArchiveStorage.configure do |config|
config.storage :main do |s|
s.provider = :s3
s.endpoint = ENV.fetch("MAIN_STORAGE_ENDPOINT")
s.bucket = "production-main"
s.access_key_id = ENV.fetch("MAIN_STORAGE_ACCESS_KEY")
s.secret_access_key = ENV.fetch("MAIN_STORAGE_SECRET_KEY")
s.region = "us-east-1"
s.path_style = true
end
config.storage :archive_001 do |s|
s.provider = :s3
s.endpoint = ENV.fetch("ARCHIVE_001_ENDPOINT")
s.bucket = "production-archive-001"
s.access_key_id = ENV.fetch("ARCHIVE_001_ACCESS_KEY")
s.secret_access_key = ENV.fetch("ARCHIVE_001_SECRET_KEY")
s.region = "us-east-1"
s.path_style = true
end
config.storage :archive_002 do |s|
s.provider = :s3
s.endpoint = ENV.fetch("ARCHIVE_002_ENDPOINT")
s.bucket = "production-archive-002"
s.access_key_id = ENV.fetch("ARCHIVE_002_ACCESS_KEY")
s.secret_access_key = ENV.fetch("ARCHIVE_002_SECRET_KEY")
s.region = "us-east-1"
s.path_style = true
end
config.schedule :archive_documents,
cron: "0 0-6,22,23 * * 1-5",
model: "ProjectDocument",
mounted_as: :file,
migration_rate: 10_000
# Optional defaults:
#
# config.job_backend = :active_job # :active_job, :good_job, :sidekiq, or :inline
# config.migration_queue = :default
# config.schedule_queue = :default
# config.default_batch_size = 500
# config.verification_strategy = :auto
# config.delete_source_enabled = false
# config.default_cleanup_delay = 7.days
end
Filesystem/NFS storage can be mixed with S3-compatible storage:
config.storage :nfs_main do |s|
s.provider = :filesystem
s.root_path = "/mnt/uploads"
end
Model Policy
Put archive policy next to the model that owns the file.
class ProjectDocument < ApplicationRecord
mount_uploader :file, DocumentUploader
archive_storage_for :file do
primary :main
archive :archive_001,
after: 90.days,
scope: :ready_for_archive,
if: ->(record) { record.closed? }
archive :archive_002,
after: 2.years,
scope: ->(records) { records.where(priority: "low") },
if: ->(record) { record.closed? }
read_fallbacks :main, :archive_001, :archive_002
# Optional:
#
# delete_source_after verification: true, delay: 7.days
# include_versions true
# versions :thumb, :preview
# timestamp_attribute :created_at
end
end
archive_storage_for automatically wires the mounted CarrierWave uploader to storage :archive_storage. The uploader can stay focused on path, filename, and version behavior:
class DocumentUploader < CarrierWave::Uploader::Base
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
end
Policy notes:
primaryis where new uploads are stored.archiverules are checked in order; the last eligible rule wins.scopenarrows the model relation before records are scanned. It can be a model scope name, a relation, or a callable that receives the current relation.read_fallbacksis the read-recovery order when registry metadata is missing or a configured fallback error is raised.- By default only the original CarrierWave file is planned. Use
include_versions trueorversions ...when thumbnails/previews must move too.
Scheduled Jobs
Schedules are declared in global configuration:
ArchiveStorage.configure do |config|
config.schedule :archive_documents,
cron: "0 0-6,22,23 * * 1-5",
model: "ProjectDocument",
mounted_as: :file,
migration_rate: 10_000
end
migration_rate means at most this many files are enqueued by one scheduled run.
archive_storage registers scheduler entries automatically. You do not need to merge ArchiveStorage.good_job_cron or ArchiveStorage.sidekiq_cron into your application config.
GoodJob
When good_job is present, archive_storage appends its entries to config.good_job.cron after Rails initialization. Existing GoodJob cron entries are preserved.
Enable GoodJob cron in the app environment where the scheduler should run:
# config/environments/production.rb
Rails.application.configure do
config.good_job.enable_cron = true
end
Sidekiq
Use Sidekiq for migration jobs:
# config/initializers/archive_storage.rb
ArchiveStorage.configure do |config|
config.job_backend = :sidekiq
end
Add one scheduler gem:
gem "sidekiq-cron"
# or
gem "sidekiq-scheduler"
On Sidekiq server startup, archive_storage adds its own schedules without deleting existing jobs:
- with
sidekiq-cron, it uses non-destructiveSidekiq::Cron::Job.load_from_hash - with
sidekiq-scheduler, it usesSidekiq.set_scheduleand reloads the scheduler
Existing jobs from sidekiq.yml, config/schedule.yml, or custom initializers remain in place.
Commands
bin/rails archive_storage:plan MODEL=ProjectDocument MOUNT=file
bin/rails archive_storage:enqueue MODEL=ProjectDocument MOUNT=file
bin/rails archive_storage:migrate MODEL=ProjectDocument MOUNT=file
bin/rails archive_storage:verify
bin/rails archive_storage:cleanup_source
bin/rails archive_storage:status
Options:
MODEL=ProjectDocument
MOUNT=file
OLDER_THAN=90d
LIMIT=10000
INLINE=true
ESTIMATE_SIZES=false
UPLOADER=DocumentUploader is still accepted for advanced/legacy uploader-level configurations.
Command behavior:
planprints a dry-run plan.enqueueandmigrateenqueue migration jobs by default.migrate INLINE=trueruns migration inline.verifyre-checks already migrated files.cleanup_sourcedeletes verified source copies that are past the cleanup delay.statusprints registry counters.
Migration Flow
source only
source + destination copied
destination verified
registry points reads to destination
reads can fallback to source
source deleted later when cleanup is enabled
Source deletion is disabled by default:
config.delete_source_enabled = false
Turn it on only after the migration path has been verified in production:
config.delete_source_enabled = true
Per-mount cleanup delay:
archive_storage_for :file do
delete_source_after verification: true, delay: 7.days
end
Verification
The default strategy is :auto.
archive_storage does not blindly trust S3 ETags. Multipart S3 uploads can have ETags like hash-3, and uploading the same bytes to another storage can produce a different ETag.
Strategies:
:auto- size check, then checksum when available, then non-multipart ETag, otherwise size-only:checksum- require matching checksums:safe_etag- require matching non-multipart ETags:etag- require matching ETags, including multipart-looking values:byte_compare- compare full file bytes after size check:size- compare content length only
ArchiveStorage.configure do |config|
config.verification_strategy = :auto
end
Registry
The generated migration creates archive_storage_files.
The registry stores:
- model identity:
record_type,record_id,mounted_as,uploader - object identity:
identifier,storage_key, source/target keys - storage state:
current_storage,source_storage,target_storage - migration state: enqueue, migration, verification, cleanup timestamps
- metadata: byte size, checksum, content type, attempts, last error
Business tables do not need extra columns for archive location.
CarrierWave Versions
CarrierWave versions are disabled by default.
archive_storage_for :file do
include_versions true
end
To migrate only selected versions:
archive_storage_for :file do
versions :thumb, :preview
end
Use this only when those files are stored and read as part of the same archival policy. It can multiply the number of objects planned for migration.
Current Scope
This MVP is focused on Rails, ActiveRecord, and CarrierWave. The storage and registry layers are not CarrierWave-specific, so other uploader integrations can be added later.