activerecord-materialized
Materialized views for Rails apps on databases that don't have them — precompute an expensive query into a cache table, refresh it in the background when the underlying data changes, and read it through a transparent ActiveRecord API.
Use case: Your reporting page runs a 12-second join across six tables. Users visit once a day. MySQL has no native materialized views. This gem gives you PostgreSQL-style semantics in application code — writes trigger refresh, reads never pay for it.
Why use this?
- Reads stay fast — queries hit a small precomputed table, not a multi-second join.
- Freshness is automatic — a write to a
depends_onmodel schedules background maintenance; you don't refresh by hand. - Nothing blocks on a rebuild — refresh is incremental and on-write, never on-read, and a full rebuild only ever happens when you explicitly ask for it.
- It's just ActiveRecord —
where,find,count, and scopes work unchanged; an unbuilt view still returns correct results by reading through to the source. - It's portable — works on MySQL, MariaDB, and SQLite, which have no native materialized views.
🚀 New here? Start with the Getting started tutorial — a hands-on, fully tested walkthrough from install to refresh-on-write.
🧪 Want to feel it? A runnable Rails demo lives in
demo/— compare raw vs. materialized timings side by side, mutate the data, and watch the view go stale and catch up.
Author: Michael Avrukin · License: MIT
Table of contents
- Why this exists
- How it works
- Research background
- Features
- Gotchas and trade-offs
- Installation
- Getting started tutorial
- Quick start
- Configuration
- API reference
- Benchmark results
- When to use (and when not to)
- Comparison with native materialized views
- Versioning
- Development
- Contributing
Why this exists
Many Rails applications on MySQL, MariaDB, or SQLite hit the same wall:
| Symptom | Example |
|---|---|
| Complex joins + aggregations | GROUP BY, DISTINCT, correlated subqueries |
| Seconds per query even with indexes | Dashboards, admin reports, analytics APIs |
| Read-heavy, write-light | Thousands of reads/day, dozens of writes/day |
| No native MV support | Unlike PostgreSQL's CREATE MATERIALIZED VIEW |
Materialized views solve this by storing query results as a physical table and refreshing that snapshot when source data changes. High-end databases (PostgreSQL, Oracle, SQL Server) provide this natively. When your database cannot, activerecord-materialized implements the same read/refresh split in Ruby — without changing how developers query data.
The problem with refresh-on-read
A naive approach refreshes the view on the first read after data changes. That punishes the unlucky user whose visit triggers a 10-second rebuild — and on a large database an implicit full rebuild can be catastrophic. This gem never rebuilds implicitly: a full materialization happens only via an explicit rebuild!(confirm: true). Routine freshness is incremental, on write (dependency changes schedule partition-local maintenance after commit), and an unbuilt view stays correct via read-through to the source query until you build it.
How it works
Architecture
flowchart TB
subgraph writes ["Write path — routine maintenance"]
W["INSERT / UPDATE / DELETE on depends_on model"]
DT["DependencyTrackable after_*_commit callbacks"]
DR["DependencyRegistry.publish_write_change!"]
MS["MaintenanceDeltaBuilder / SummaryDeltaBuilder + MaintenanceStore"]
RS[RefreshScheduler]
AR["AsyncRefresher or RefreshJob"]
RFR["Refresher dispatch"]
DM["DeltaMaintainer — signed summary deltas (distributive views)"]
IM["IncrementalMaintainer — scoped delete + re-aggregate (fallback)"]
W --> DT --> DR --> MS --> RS --> AR --> RFR
RFR --> DM
RFR --> IM
end
subgraph bootstrap ["Bootstrap once (explicit rebuild!)"]
RF["Refresher — RelationCacheWriter INSERT … SELECT + atomic swap"]
end
subgraph reads ["Read path — always fast"]
Q["SalesSummary queries"]
CT[("mv_sales_summary cache table")]
Q --> CT
end
subgraph meta [Metadata]
MD[("ar_materialized_view_metadata")]
DM --> CT
IM --> CT
DM --> MD
IM --> MD
MS --> MD
RF --> CT
RF --> MD
end
DM -.->|"applies partition deltas in place"| CT
IM -.->|"re-aggregates affected partitions"| CT
RF -.->|"initial snapshot"| CT
Refresh lifecycle
- Define a view class with a
materialized_fromblock (returning anActiveRecord::Relation) anddepends_onmodels. - Build — an explicit
rebuild!(confirm: true)materializes the source relation into the cache table viaRelationCacheWriter+ atomic swap. This is the only full-scan path and never fires implicitly; until it runs, reads fall through to the source (cold_read :read_through). - Write — any create/update/destroy on a
depends_onmodel fires anafter_*_commitcallback (installed byDependencyTrackable) that callsDependencyRegistry.publish_write_change!. - Accumulate — for each affected view,
MaintenanceDeltaBuilderrecords affectedGROUP BYpartition keys inMaintenanceStore(widens to all partitions when scope is unknown). - Defer —
after_*_commitfires only once the writing transaction commits, so changes are batched naturally and a rolled-back transaction schedules nothing. - Debounce — rapid writes coalesce into one maintenance pass (configurable window).
- Maintain — distributive views (
SUM/COUNT/COUNT(*)) apply signed summary deltas straight to the affected cache rows without re-reading base rows (DeltaMaintainer); everything else (AVG,MIN,MAX,COUNT(DISTINCT), joins,HAVING) re-aggregates only the affected partitions (IncrementalMaintainer). Neither path does DDL or an atomic swap on the hot path. - Read — once built,
where,find,count, scopes query the cache table directly; reads before maintenance completes return the previous snapshot, reads after see updated partitions. Before the view is built, reads transparently fall through to the source query.
Core components
| Component | Role |
|---|---|
ActiveRecord::Materialized::View |
Base model; DSL and query interface |
DependencyTrackable |
Installs after_*_commit callbacks on depends_on models |
DependencyRegistry |
Maps tables → view classes; publishes commit writes to affected views |
RefreshScheduler |
Dispatches :async, :immediate, or :manual strategies |
AsyncRefresher |
Debounced in-process background maintenance (tests: flush!) |
RefreshJob |
Optional ActiveJob wrapper for production workers |
ViewDefinition |
Inspects source relations for GROUP BY maintenance keys |
AggregateAnalysis |
Classifies a view's aggregates; decides if it is summary-delta maintainable |
MaintenanceDeltaBuilder |
Maps ActiveRecord change payloads to affected partition keys (scoped recompute) |
SummaryDeltaBuilder / SummaryDelta |
Compute and accumulate signed per-partition aggregate deltas (distributive views) |
MaintenanceStore |
Persists pending maintenance (delta or scope) in metadata |
DeltaMaintainer |
Hot path for distributive views: applies summary deltas in place, no base re-read |
IncrementalMaintainer |
Fallback hot path: partition delete + re-aggregate in the existing cache table |
Refresher |
Orchestrates explicit bootstrap/full refresh and dispatches incremental maintenance |
RelationCacheWriter |
Materializes the relation via INSERT … SELECT; atomic table swap on full refresh |
QueryExpressions |
Portable Arel helpers (sum_as, count_distinct_as, …) for view definitions |
Metadata |
Tracks dirty, maintenance_payload, last_refreshed_at, row_count, errors |
Research background
This gem applies decades of materialized-view and incremental-maintenance research to the application layer.
Foundational surveys
| Topic | Reference |
|---|---|
| Materialized views monograph | Chirkova & Yang, Materialized Views (Foundations and Trends in Databases, 2012) — definitions, refresh strategies, view selection, query rewriting |
| View maintenance taxonomy | Gupta & Mumick, Maintenance of Materialized Views: Problems, Techniques, and Applications (IEEE Data Engineering Bulletin, 1995) — when full vs incremental refresh is appropriate |
Incremental view maintenance
| Topic | Reference |
|---|---|
| Warehousing & decoupled sources | Zhuge et al., View Maintenance in a Warehousing Environment (SIGMOD 1995) — maintaining views when base data lives outside the warehouse |
| Higher-order deltas | Ahmad et al., DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views (VLDB 2012) — recursive finite-differencing for low-latency view refresh |
| Factorized IVM (F-IVM) | Nikolic & Olteanu, Incremental View Maintenance with Triple Lock Factorization Benefits (SIGMOD 2018) — factorized higher-order maintenance for conjunctive queries and aggregates |
| IVM survey (recent) | Olteanu, Recent Increments in Incremental View Maintenance (PODS 2024 Gems) — fine-grained complexity and modern IVM engines |
Systems & dataflow approaches
| Topic | Reference |
|---|---|
| Differential dataflow | McSherry et al., Differential Dataflow (CIDR 2013) — incremental computation over changing data with multi-version state |
| Application-layer precomputation | Gjengset et al., Noria: dynamic, partially-stateful data-flow for high-performance web applications (OSDI 2018) — partially-stateful dataflow that incrementally maintains query results for web backends |
Practical references
| Topic | Reference |
|---|---|
| Production reference | PostgreSQL: REFRESH MATERIALIZED VIEW — CONCURRENTLY refresh, separate read/refresh paths |
| Benchmark schema | Leis et al., How Good Are Query Optimizers, Really? (VLDB 2015) — Join Order Benchmark used in this repo's benchmark suite |
Design choice: After a one-time bootstrap, routine refresh uses incremental view maintenance (IVM) by default. Following Gupta & Mumick, aggregate views with GROUP BY are maintained by recomputing only affected partitions (group keys) and merging them into the existing cache table — no table rebuild, no atomic swap on the hot path. Writes on depends_on models accumulate partition keys from ActiveRecord change payloads; maintenance deletes stale partition rows and inserts freshly aggregated replacements. Use refresh_mode :full when a view cannot be maintained incrementally.
Features
- Refresh on write — dependency changes schedule background refresh; reads never block on rebuild
- Transparent ActiveRecord API —
where,find,count, scopes, associations on cache tables - Relation-based sources —
materialized_fromblocks returnActiveRecord::Relation(no raw SQL strings) - Portable aggregations —
QueryExpressionshelpers build Arel forSUM,COUNT,AVG, etc. - Incremental maintenance by default — summary-delta IVM for distributive
GROUP BYviews (signed deltas, no base re-scan) with partition-local re-aggregation as the always-correct fallback; no cache-table rebuild on routine refresh - Atomic table swap on bootstrap only — initial full materialization + rename when the cache is first built or on
refresh_mode :full - Debounced async refresh — coalesce rapid writes (PostgreSQL NOTIFY + worker pattern)
- ActiveJob integration — offload refresh to Sidekiq, GoodJob, Solid Queue, etc.
- Dependency tracking —
depends_onmodels; ActiveRecord commit callbacks detect writes - Metadata table —
last_refreshed_at,dirty,row_count,refresh_duration_ms, errors - Staleness safety net — optional
max_staleness+ rake tasks for cron-driven refresh - Rails generators —
activerecord_materialized:install,:view, and:migration(cache-table migration inferred from the source relation) - Rake tasks —
materialized:refresh_all,:refresh_stale,:rebuild,:verify,:warm_up - Benchmark suite — JOB-schema SQLite database with multi-second analytical queries
Gotchas and trade-offs
| Gotcha | Detail |
|---|---|
| Eventual consistency | Between a write and background refresh completing, reads return the previous snapshot. Same trade-off as REFRESH MATERIALIZED VIEW CONCURRENTLY in PostgreSQL. |
depends_on is required |
The gem cannot infer dependencies from a relation. Declare every model (or table) whose writes should trigger refresh. Prefer model classes (depends_on LineItem) so commit callbacks are wired automatically. |
| Maintenance scope | Partition keys are taken from ActiveRecord change payloads when possible (create/update/destroy with equality on GROUP BY columns). Unbounded writes widen to all partitions (in-place, still no DDL). |
| Non-aggregate views | Views without GROUP BY fall back to full refresh (refresh_mode :full or atomic swap). Join-heavy maintenance (Larson & Zhou) is not automatic yet. |
| Full refresh escape hatch | rebuild!(confirm: true) (or refresh_mode :full) rebuilds via atomic swap — use for recovery or non-maintainable views. refresh! is always incremental and never rebuilds. |
Table-name-only depends_on |
Symbol/string table names work, but refresh-on-write requires a resolvable ActiveRecord model for that table. Raw SQL writes bypass callbacks and will not trigger refresh. |
| SQLite vs MySQL in dev | The benchmark uses SQLite. Production behavior is adapter-agnostic, but test atomic swap on your target database. |
| In-process async default | Default refresh_dispatcher: :async uses a background thread. Use ActiveJob in production so refresh work runs on job workers, not Puma threads. |
| No automatic indexes | Cache tables are created from query results. Add indexes on cache columns you filter/sort on. |
| Storage | Cache tables duplicate data. Plan disk usage accordingly. |
| Nested transactions | Refresh is scheduled on the transaction where the write occurred; rollback clears pending refreshes for that transaction. |
| Bulk writes | Each committed row to a depends_on model runs the maintenance bookkeeping once. Use :async (with a non-zero debounce, the default) or :manual, not refresh_debounce 0 or :immediate. Pending scope that spans more than max_tracked_partitions distinct partitions collapses to one full recompute. insert_all/upsert_all bypass after_commit, so the view won't be notified — call refresh! (or mark_dependencies_changed!) yourself after a callback-skipping bulk load. |
Installation
Add to your Gemfile:
gem "activerecord-materialized"
Install the metadata migration:
bin/rails generate activerecord_materialized:install
bin/rails db:migrate
Getting started tutorial
The Getting started tutorial is the recommended first read: a hands-on walkthrough that goes from bundle install to a view that refreshes itself on write — defining a view, reading through before it's built, building it, querying it, and watching background maintenance update it. Every example in it is executed by the test suite (spec/docs/getting_started_tutorial_spec.rb), so the code and the numbers are guaranteed to work.
The condensed reference version follows below.
Quick start
Generate a view model:
bin/rails generate activerecord_materialized:view SalesSummary
Define the view:
class SalesSummary < ActiveRecord::Materialized::View
extend ActiveRecord::Materialized::QueryExpressions
self.table_name = "mv_sales_summary"
materialized_from do
line_items = LineItem.arel_table
orders = Order.arel_table
products = Product.arel_table
LineItem
.joins(:order, :product)
.group(products[:category])
.select(
products[:category],
sum_as(line_items[:amount], as: :revenue),
count_distinct_as(orders[:id], as: :order_count)
)
end
depends_on LineItem, Order, Product
refresh_on_change :async
refresh_debounce 30.seconds
max_staleness 12.hours
before_refresh { Rails.logger.info("Refreshing #{name}") }
end
Sources must be ActiveRecord::Relation objects built with standard query APIs and Arel — not raw SQL strings. Extract complex relations to a module or class method when a view definition grows large (see spec/support/view_sources.rb and benchmark/support/source_relations.rb in this repo).
Provision the (empty) cache table with a migration generated from the relation, so it exists at deploy time:
bin/rails generate activerecord_materialized:migration SalesSummary
bin/rails db:migrate
Build the view once (e.g. in a deploy task) — the only full-scan path, never implicit:
SalesSummary.rebuild!(confirm: true)
Then query like any ActiveRecord model:
# Served from the mv_sales_summary cache table — never triggers a rebuild.
# (Before the view is built, this reads through to the source query instead.)
SalesSummary.where("revenue > ?", 10_000).order(revenue: :desc)
Refresh strategies:
| Strategy | Behavior |
|---|---|
:async (default) |
After commit, debounced, via background thread or ActiveJob |
:immediate |
Synchronous refresh on each write (blocks writers) |
:manual |
Mark dirty only; call refresh! or rake tasks explicitly |
Incremental maintenance (default)
For GROUP BY aggregate views, no extra configuration is required. The gem:
- Inspects the
materialized_fromrelation to derive maintenance partition keys (GROUP BYcolumns). - Accumulates affected partition keys from dependency writes (via ActiveRecord commit callbacks).
- On refresh, deletes and re-inserts only those partitions in the existing cache table.
Optional overrides when you need explicit control:
class SalesSummary < ActiveRecord::Materialized::View
incremental_keys :category # override inferred GROUP BY keys
refresh_mode :full # opt out of incremental maintenance
# incremental_from { ... } # optional: override auto-scoped maintenance relation
end
Configuration
# config/initializers/activerecord_materialized.rb
ActiveRecord::Materialized.configure do |config|
config.default_refresh_strategy = :async
config.default_refresh_debounce = 30.seconds
config.refresh_dispatcher = :active_job # :async for in-process thread
config.refresh_queue_name = :materialized_views
config.default_max_staleness = 12.hours
config.default_cold_read_strategy = :read_through # :serve_stale or :raise
config.atomic_swap_refresh = true
config.max_tracked_partitions = 1_000 # collapse to a full recompute past this
config. = "ar_materialized_view_metadata"
end
API reference
Class methods
| Method | Description |
|---|---|
rebuild!(confirm: true) |
Explicit full materialization via in-database INSERT … SELECT (the only full-scan path; never fires implicitly, never buffers rows in Ruby) |
warm_up! |
Materialize the configured warm_up partitions ahead of traffic |
refresh! |
Incremental maintenance only (no-op on an unbuilt view); never rebuilds |
refresh_if_stale! |
Incremental maintenance when materialized and stale |
materialized? |
Whether the view has been built (warm) and reads serve from the cache |
dirty? |
Whether a dependency change is pending maintenance |
stale? |
Whether view is dirty or exceeds max_staleness |
last_refreshed_at |
Timestamp of last successful refresh |
refreshing? |
Whether a refresh is in progress |
resolved_source |
The current ActiveRecord::Relation used for refresh |
DSL
| Macro | Description |
|---|---|
materialized_from { relation } |
Block returning the source ActiveRecord::Relation |
depends_on(*models_or_tables) |
Register dependencies; writes trigger refresh |
refresh_on_change(strategy) |
:async, :immediate, or :manual |
refresh_debounce(duration) |
Coalesce rapid writes before refreshing |
refresh_mode(mode) |
:incremental (default) or :full |
cold_read(strategy) |
Read behavior before the view is built: :read_through (default), :serve_stale, or :raise |
warm_up { [relations] } |
Representative queries whose partitions warm_up! materializes ahead of traffic |
incremental_from { relation } |
Optional override for scoped maintenance relation |
incremental_keys(*columns) |
Optional override for inferred GROUP BY keys |
max_staleness(duration) |
Optional time-based safety refresh via rake/cron |
before_refresh / after_refresh |
Refresh lifecycle callbacks |
QueryExpressions
Include or extend ActiveRecord::Materialized::QueryExpressions when defining aggregations:
| Helper | Arel equivalent |
|---|---|
sum_as(attr, as: :name) |
SUM(...) |
avg_as(attr, as: :name) |
AVG(...) |
count_as(attr, as: :name) |
COUNT(...) |
count_distinct_as(attr, as: :name) |
COUNT(DISTINCT ...) |
count_all_as(as: :name) |
COUNT(*) |
min_as / max_as |
MIN / MAX |
Rake tasks
bin/rails materialized:refresh_all # incremental maintenance pass
bin/rails materialized:refresh_stale
bin/rails materialized:rebuild # intentional full materialization (in-DB INSERT … SELECT)
bin/rails materialized:verify # raise on cache-table schema drift
bin/rails materialized:warm_up # materialize configured warm_up partitions
Benchmark results
The included benchmark uses a Join Order Benchmark-style schema on SQLite. On the xlarge dataset (~2M cast_info rows):
| Query | Source relation | MV read | Speedup |
|---|---|---|---|
gender_pairing_stats |
~7.4s | ~0.3ms | ~21,000× |
company_movie_cross |
~7.4s | ~0.4ms | ~20,000× |
person_movie_network |
~13.3s | ~0.7ms | ~20,000× |
cast_coappearance |
~19.7s | ~0.4ms | ~49,000× |
Run locally:
bundle install
JOB_SCALE=xlarge bundle exec rake benchmark:setup # ~few minutes
bundle exec rake benchmark:slow
bundle exec rake benchmark:verify_updates # refresh-on-write proof
See benchmark/DATA.md for dataset scales and setup details.
When to use (and when not to)
Good fit:
- Expensive read-mostly reporting queries on MySQL/MariaDB/SQLite
- Dashboards and admin pages where sub-second reads matter
- Infrequent or batched writes to underlying tables
- Acceptable eventual consistency between write and background refresh
Poor fit:
- Real-time, strongly consistent reads (use live queries or replicas)
- Very frequent writes where full refresh cost exceeds query cost
- Tiny queries where materialization overhead isn't worth it
- Views where you cannot enumerate all
depends_ontables
Comparison with native materialized views
| Capability | PostgreSQL native | activerecord-materialized |
|---|---|---|
| Precomputed snapshot | ✅ | ✅ |
| Transparent reads | ✅ (query rewrite or direct) | ✅ (ActiveRecord model) |
| Refresh on dependency change | Manual / trigger / pg_cron | ✅ automatic via depends_on |
| Background refresh | REFRESH ... CONCURRENTLY |
✅ async / ActiveJob |
| Incremental refresh | Limited (IVM extensions) | ✅ default partition-local IVM for GROUP BY views |
| Atomic swap during refresh | ✅ CONCURRENTLY | ✅ table rename |
| Database portability | PostgreSQL only | ✅ any ActiveRecord adapter |
Versioning
This gem follows Semantic Versioning. Given MAJOR.MINOR.PATCH:
- MAJOR — incompatible public-API changes (DSL macros, configuration keys, the
Viewquery surface). - MINOR — backward-compatible features.
- PATCH — backward-compatible bug fixes.
Until 1.0.0, the API may still change between minor releases; pin a version if you depend on it. Every change is recorded in CHANGELOG.md.
Development
git clone https://github.com/mavrukin/activerecord-materialized.git
cd activerecord-materialized
bin/setup # bundle install + git hooks + Sorbet RBIs
bin/ci # RuboCop, Sorbet, and the full test suite
bundle exec rake benchmark:setup
bundle exec rake benchmark
API documentation is published at rubydoc.info/gems/activerecord-materialized (generated from YARD doc comments, with types pulled from the Sorbet signatures via yard-sorbet). Build it locally with:
bundle exec yard doc # generates HTML into doc/
bundle exec yard server # browse at http://localhost:8808
Maintainers: see RELEASING.md for the gem publishing process.
Contributing
Bug reports and pull requests are welcome at github.com/mavrukin/activerecord-materialized.
License
MIT © Michael Avrukin