File: README — Documentation by YARD 0.9.38

activerecord-materialized

Materialized views for Rails apps on databases that don't have them — precompute an expensive query into a cache table, refresh it in the background when the underlying data changes, and read it through a transparent ActiveRecord API.

Use case: Your reporting page runs a 12-second join across six tables. Users visit once a day. MySQL has no native materialized views. This gem gives you PostgreSQL-style semantics in application code — writes trigger refresh, reads never pay for it.

Why use this?

Reads stay fast — queries hit a small precomputed table, not a multi-second join.
Freshness is automatic — a write to a depends_on model schedules background maintenance; you don't refresh by hand.
Nothing blocks on a rebuild — refresh is incremental and on-write, never on-read, and a full rebuild only ever happens when you explicitly ask for it.
It's just ActiveRecord — where, find, count, and scopes work unchanged; an unbuilt view still returns correct results by reading through to the source.
It's portable — works on MySQL, MariaDB, and SQLite, which have no native materialized views.

🚀 New here? Start with the Getting started tutorial — a hands-on, fully tested walkthrough from install to refresh-on-write.

🧪 Want to feel it? A runnable Rails demo lives in demo/ — compare raw vs. materialized timings side by side, mutate the data, and watch the view go stale and catch up.

Author: Michael Avrukin · License: MIT

Why this exists
How it works
Research background
Features
Gotchas and trade-offs
Installation
Getting started tutorial
Quick start
Configuration
API reference
Benchmark results
When to use (and when not to)
Comparison with native materialized views
Versioning
Development
Contributing

Why this exists

Many Rails applications on MySQL, MariaDB, or SQLite hit the same wall:

Symptom	Example
Complex joins + aggregations	`GROUP BY`, `DISTINCT`, correlated subqueries
Seconds per query even with indexes	Dashboards, admin reports, analytics APIs
Read-heavy, write-light	Thousands of reads/day, dozens of writes/day
No native MV support	Unlike PostgreSQL's `CREATE MATERIALIZED VIEW`

Materialized views solve this by storing query results as a physical table and refreshing that snapshot when source data changes. High-end databases (PostgreSQL, Oracle, SQL Server) provide this natively. When your database cannot, activerecord-materialized implements the same read/refresh split in Ruby — without changing how developers query data.

The problem with refresh-on-read

A naive approach refreshes the view on the first read after data changes. That punishes the unlucky user whose visit triggers a 10-second rebuild — and on a large database an implicit full rebuild can be catastrophic. This gem never rebuilds implicitly: a full materialization happens only via an explicit rebuild!(confirm: true). Routine freshness is incremental, on write (dependency changes schedule partition-local maintenance after commit), and an unbuilt view stays correct via read-through to the source query until you build it.

How it works

Architecture

flowchart TB
    subgraph writes ["Write path — routine maintenance"]
        W["INSERT / UPDATE / DELETE on depends_on model"]
        DT["DependencyTrackable after_*_commit callbacks"]
        DR["DependencyRegistry.publish_write_change!"]
        MS["MaintenanceDeltaBuilder / SummaryDeltaBuilder + MaintenanceStore"]
        RS[RefreshScheduler]
        AR["AsyncRefresher or RefreshJob"]
        RFR["Refresher dispatch"]
        DM["DeltaMaintainer — signed summary deltas (distributive views)"]
        IM["IncrementalMaintainer — scoped delete + re-aggregate (fallback)"]
        W --> DT --> DR --> MS --> RS --> AR --> RFR
        RFR --> DM
        RFR --> IM
    end

    subgraph bootstrap ["Bootstrap once (explicit rebuild!)"]
        RF["Refresher — RelationCacheWriter INSERT … SELECT + atomic swap"]
    end

    subgraph reads ["Read path — always fast"]
        Q["SalesSummary queries"]
        CT[("mv_sales_summary cache table")]
        Q --> CT
    end

    subgraph meta [Metadata]
        MD[("ar_materialized_view_metadata")]
        DM --> CT
        IM --> CT
        DM --> MD
        IM --> MD
        MS --> MD
        RF --> CT
        RF --> MD
    end

    DM -.->|"applies partition deltas in place"| CT
    IM -.->|"re-aggregates affected partitions"| CT
    RF -.->|"initial snapshot"| CT

Refresh lifecycle

Define a view class with a materialized_from block (returning an ActiveRecord::Relation) and depends_on models.
Build — an explicit rebuild!(confirm: true) materializes the source relation into the cache table via RelationCacheWriter + atomic swap. This is the only full-scan path and never fires implicitly; until it runs, reads fall through to the source (cold_read :read_through).
Write — any create/update/destroy on a depends_on model fires an after_*_commit callback (installed by DependencyTrackable) that calls DependencyRegistry.publish_write_change!.
Accumulate — for each affected view, MaintenanceDeltaBuilder records affected GROUP BY partition keys in MaintenanceStore (widens to all partitions when scope is unknown).
Defer — after_*_commit fires only once the writing transaction commits, so changes are batched naturally and a rolled-back transaction schedules nothing.
Debounce — rapid writes coalesce into one maintenance pass (configurable window).
Maintain — distributive views (SUM/COUNT/COUNT(*)) apply signed summary deltas straight to the affected cache rows without re-reading base rows (DeltaMaintainer); everything else (AVG, MIN, MAX, COUNT(DISTINCT), joins, HAVING) re-aggregates only the affected partitions (IncrementalMaintainer). Neither path does DDL or an atomic swap on the hot path.
Read — once built, where, find, count, scopes query the cache table directly; reads before maintenance completes return the previous snapshot, reads after see updated partitions. Before the view is built, reads transparently fall through to the source query.

Core components

Component	Role
`ActiveRecord::Materialized::View`	Base model; DSL and query interface
`DependencyTrackable`	Installs `after_*_commit` callbacks on `depends_on` models
`DependencyRegistry`	Maps tables → view classes; publishes commit writes to affected views
`RefreshScheduler`	Dispatches `:async`, `:immediate`, or `:manual` strategies
`AsyncRefresher`	Debounced in-process background maintenance (tests: `flush!`)
`RefreshJob`	Optional ActiveJob wrapper for production workers
`ViewDefinition`	Inspects source relations for `GROUP BY` maintenance keys
`AggregateAnalysis`	Classifies a view's aggregates; decides if it is summary-delta maintainable
`MaintenanceDeltaBuilder`	Maps ActiveRecord change payloads to affected partition keys (scoped recompute)
`SummaryDeltaBuilder` / `SummaryDelta`	Compute and accumulate signed per-partition aggregate deltas (distributive views)
`MaintenanceStore`	Persists pending maintenance (delta or scope) in metadata
`DeltaMaintainer`	Hot path for distributive views: applies summary deltas in place, no base re-read
`IncrementalMaintainer`	Fallback hot path: partition delete + re-aggregate in the existing cache table
`Refresher`	Orchestrates explicit bootstrap/full refresh and dispatches incremental maintenance
`RelationCacheWriter`	Materializes the relation via `INSERT … SELECT`; atomic table swap on full refresh
`QueryExpressions`	Portable Arel helpers (`sum_as`, `count_distinct_as`, …) for view definitions
`Metadata`	Tracks `dirty`, `maintenance_payload`, `last_refreshed_at`, `row_count`, errors

Research background

This gem applies decades of materialized-view and incremental-maintenance research to the application layer.

Foundational surveys

Topic	Reference
Materialized views monograph	Chirkova & Yang, Materialized Views (Foundations and Trends in Databases, 2012) — definitions, refresh strategies, view selection, query rewriting
View maintenance taxonomy	Gupta & Mumick, Maintenance of Materialized Views: Problems, Techniques, and Applications (IEEE Data Engineering Bulletin, 1995) — when full vs incremental refresh is appropriate

Incremental view maintenance

Topic	Reference
Warehousing & decoupled sources	Zhuge et al., View Maintenance in a Warehousing Environment (SIGMOD 1995) — maintaining views when base data lives outside the warehouse
Higher-order deltas	Ahmad et al., DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views (VLDB 2012) — recursive finite-differencing for low-latency view refresh
Factorized IVM (F-IVM)	Nikolic & Olteanu, Incremental View Maintenance with Triple Lock Factorization Benefits (SIGMOD 2018) — factorized higher-order maintenance for conjunctive queries and aggregates
IVM survey (recent)	Olteanu, Recent Increments in Incremental View Maintenance (PODS 2024 Gems) — fine-grained complexity and modern IVM engines

Systems & dataflow approaches

Topic	Reference
Differential dataflow	McSherry et al., Differential Dataflow (CIDR 2013) — incremental computation over changing data with multi-version state
Application-layer precomputation	Gjengset et al., Noria: dynamic, partially-stateful data-flow for high-performance web applications (OSDI 2018) — partially-stateful dataflow that incrementally maintains query results for web backends

Practical references

Topic	Reference
Production reference	PostgreSQL: REFRESH MATERIALIZED VIEW — `CONCURRENTLY` refresh, separate read/refresh paths
Benchmark schema	Leis et al., How Good Are Query Optimizers, Really? (VLDB 2015) — Join Order Benchmark used in this repo's benchmark suite

Design choice: After a one-time bootstrap, routine refresh uses incremental view maintenance (IVM) by default. Following Gupta & Mumick, aggregate views with GROUP BY are maintained by recomputing only affected partitions (group keys) and merging them into the existing cache table — no table rebuild, no atomic swap on the hot path. Writes on depends_on models accumulate partition keys from ActiveRecord change payloads; maintenance deletes stale partition rows and inserts freshly aggregated replacements. Use refresh_mode :full when a view cannot be maintained incrementally.

Features

Refresh on write — dependency changes schedule background refresh; reads never block on rebuild
Transparent ActiveRecord API — where, find, count, scopes, associations on cache tables
Relation-based sources — materialized_from blocks return ActiveRecord::Relation (no raw SQL strings)
Portable aggregations — QueryExpressions helpers build Arel for SUM, COUNT, AVG, etc.
Incremental maintenance by default — summary-delta IVM for distributive GROUP BY views (signed deltas, no base re-scan) with partition-local re-aggregation as the always-correct fallback; no cache-table rebuild on routine refresh
Atomic table swap on bootstrap only — initial full materialization + rename when the cache is first built or on refresh_mode :full
Debounced async refresh — coalesce rapid writes (PostgreSQL NOTIFY + worker pattern)
ActiveJob integration — offload refresh to Sidekiq, GoodJob, Solid Queue, etc.
Dependency tracking — depends_on models; ActiveRecord commit callbacks detect writes
Metadata table — last_refreshed_at, dirty, row_count, refresh_duration_ms, errors
Staleness safety net — optional max_staleness + rake tasks for cron-driven refresh
Rails generators — activerecord_materialized:install, :view, and :migration (cache-table migration inferred from the source relation)
Rake tasks — materialized:refresh_all, :refresh_stale, :rebuild, :verify, :warm_up
Benchmark suite — JOB-schema SQLite database with multi-second analytical queries

Gotchas and trade-offs

Gotcha	Detail
Eventual consistency	Between a write and background refresh completing, reads return the previous snapshot. Same trade-off as `REFRESH MATERIALIZED VIEW CONCURRENTLY` in PostgreSQL.
`depends_on` is required	The gem cannot infer dependencies from a relation. Declare every model (or table) whose writes should trigger refresh. Prefer model classes (`depends_on LineItem`) so commit callbacks are wired automatically.
Maintenance scope	Partition keys are taken from ActiveRecord change payloads when possible (`create`/`update`/`destroy` with equality on `GROUP BY` columns). Unbounded writes widen to all partitions (in-place, still no DDL).
Non-aggregate views	Views without `GROUP BY` fall back to full refresh (`refresh_mode :full` or atomic swap). Join-heavy maintenance (Larson & Zhou) is not automatic yet.
Full refresh escape hatch	`rebuild!(confirm: true)` (or `refresh_mode :full`) rebuilds via atomic swap — use for recovery or non-maintainable views. `refresh!` is always incremental and never rebuilds.
Table-name-only `depends_on`	Symbol/string table names work, but refresh-on-write requires a resolvable ActiveRecord model for that table. Raw SQL writes bypass callbacks and will not trigger refresh.
SQLite vs MySQL in dev	The benchmark uses SQLite. Production behavior is adapter-agnostic, but test atomic swap on your target database.
In-process async default	Default `refresh_dispatcher: :async` uses a background thread. Use ActiveJob in production so refresh work runs on job workers, not Puma threads.
No automatic indexes	Cache tables are created from query results. Add indexes on cache columns you filter/sort on.
Storage	Cache tables duplicate data. Plan disk usage accordingly.
Nested transactions	Refresh is scheduled on the transaction where the write occurred; rollback clears pending refreshes for that transaction.
Bulk writes	Each committed row to a `depends_on` model runs the maintenance bookkeeping once. Use `:async` (with a non-zero debounce, the default) or `:manual`, not `refresh_debounce 0` or `:immediate`. Pending scope that spans more than `max_tracked_partitions` distinct partitions collapses to one full recompute. `insert_all`/`upsert_all` bypass `after_commit`, so the view won't be notified — call `refresh!` (or `mark_dependencies_changed!`) yourself after a callback-skipping bulk load.

Installation

Add to your Gemfile:

gem "activerecord-materialized"

Install the metadata migration:

bin/rails generate activerecord_materialized:install
bin/rails db:migrate

Getting started tutorial

The Getting started tutorial is the recommended first read: a hands-on walkthrough that goes from bundle install to a view that refreshes itself on write — defining a view, reading through before it's built, building it, querying it, and watching background maintenance update it. Every example in it is executed by the test suite (spec/docs/getting_started_tutorial_spec.rb), so the code and the numbers are guaranteed to work.

The condensed reference version follows below.

Quick start

Generate a view model:

bin/rails generate activerecord_materialized:view SalesSummary

Define the view:

class SalesSummary < ActiveRecord::Materialized::View
  extend ActiveRecord::Materialized::QueryExpressions

  self.table_name = "mv_sales_summary"

  materialized_from do
    line_items = LineItem.arel_table
    orders = Order.arel_table
    products = Product.arel_table

    LineItem
      .joins(:order, :product)
      .group(products[:category])
      .select(
        products[:category],
        sum_as(line_items[:amount], as: :revenue),
        count_distinct_as(orders[:id], as: :order_count)
      )
  end

  depends_on LineItem, Order, Product
  refresh_on_change :async
  refresh_debounce 30.seconds
  max_staleness 12.hours

  before_refresh { Rails.logger.info("Refreshing #{name}") }
end

Sources must be ActiveRecord::Relation objects built with standard query APIs and Arel — not raw SQL strings. Extract complex relations to a module or class method when a view definition grows large (see spec/support/view_sources.rb and benchmark/support/source_relations.rb in this repo).

Provision the (empty) cache table with a migration generated from the relation, so it exists at deploy time:

bin/rails generate activerecord_materialized:migration SalesSummary
bin/rails db:migrate

Build the view once (e.g. in a deploy task) — the only full-scan path, never implicit:

SalesSummary.rebuild!(confirm: true)

Then query like any ActiveRecord model:

# Served from the mv_sales_summary cache table — never triggers a rebuild.
# (Before the view is built, this reads through to the source query instead.)
SalesSummary.where("revenue > ?", 10_000).order(revenue: :desc)

Refresh strategies:

Strategy	Behavior
`:async` (default)	After commit, debounced, via background thread or ActiveJob
`:immediate`	Synchronous refresh on each write (blocks writers)
`:manual`	Mark dirty only; call `refresh!` or rake tasks explicitly

Incremental maintenance (default)

For GROUP BY aggregate views, no extra configuration is required. The gem:

Inspects the materialized_from relation to derive maintenance partition keys (GROUP BY columns).
Accumulates affected partition keys from dependency writes (via ActiveRecord commit callbacks).
On refresh, deletes and re-inserts only those partitions in the existing cache table.

Optional overrides when you need explicit control:

class SalesSummary < ActiveRecord::Materialized::View
  incremental_keys :category # override inferred GROUP BY keys
  refresh_mode :full         # opt out of incremental maintenance
  # incremental_from { ... } # optional: override auto-scoped maintenance relation
end

Configuration

# config/initializers/activerecord_materialized.rb
ActiveRecord::Materialized.configure do |config|
  config.default_refresh_strategy = :async
  config.default_refresh_debounce = 30.seconds
  config.refresh_dispatcher = :active_job   # :async for in-process thread
  config.refresh_queue_name = :materialized_views
  config.default_max_staleness = 12.hours
  config.default_cold_read_strategy = :read_through  # :serve_stale or :raise
  config.atomic_swap_refresh = true
  config.max_tracked_partitions = 1_000             # collapse to a full recompute past this
  config.metadata_table_name = "ar_materialized_view_metadata"
end

API reference

Class methods

Method	Description
`rebuild!(confirm: true)`	Explicit full materialization via in-database `INSERT … SELECT` (the only full-scan path; never fires implicitly, never buffers rows in Ruby)
`warm_up!`	Materialize the configured `warm_up` partitions ahead of traffic
`refresh!`	Incremental maintenance only (no-op on an unbuilt view); never rebuilds
`refresh_if_stale!`	Incremental maintenance when materialized and stale
`materialized?`	Whether the view has been built (warm) and reads serve from the cache
`dirty?`	Whether a dependency change is pending maintenance
`stale?`	Whether view is dirty or exceeds `max_staleness`
`last_refreshed_at`	Timestamp of last successful refresh
`refreshing?`	Whether a refresh is in progress
`resolved_source`	The current `ActiveRecord::Relation` used for refresh

DSL

Macro	Description
`materialized_from { relation }`	Block returning the source `ActiveRecord::Relation`
`depends_on(*models_or_tables)`	Register dependencies; writes trigger refresh
`refresh_on_change(strategy)`	`:async`, `:immediate`, or `:manual`
`refresh_debounce(duration)`	Coalesce rapid writes before refreshing
`refresh_mode(mode)`	`:incremental` (default) or `:full`
`cold_read(strategy)`	Read behavior before the view is built: `:read_through` (default), `:serve_stale`, or `:raise`
`warm_up { [relations] }`	Representative queries whose partitions `warm_up!` materializes ahead of traffic
`incremental_from { relation }`	Optional override for scoped maintenance relation
`incremental_keys(*columns)`	Optional override for inferred `GROUP BY` keys
`max_staleness(duration)`	Optional time-based safety refresh via rake/cron
`before_refresh` / `after_refresh`	Refresh lifecycle callbacks

QueryExpressions

Include or extend ActiveRecord::Materialized::QueryExpressions when defining aggregations:

Helper	Arel equivalent
`sum_as(attr, as: :name)`	`SUM(...)`
`avg_as(attr, as: :name)`	`AVG(...)`
`count_as(attr, as: :name)`	`COUNT(...)`
`count_distinct_as(attr, as: :name)`	`COUNT(DISTINCT ...)`
`count_all_as(as: :name)`	`COUNT(*)`
`min_as` / `max_as`	`MIN` / `MAX`

Rake tasks

bin/rails materialized:refresh_all     # incremental maintenance pass
bin/rails materialized:refresh_stale
bin/rails materialized:rebuild         # intentional full materialization (in-DB INSERT … SELECT)
bin/rails materialized:verify          # raise on cache-table schema drift
bin/rails materialized:warm_up         # materialize configured warm_up partitions

Benchmark results

The included benchmark uses a Join Order Benchmark-style schema on SQLite. On the xlarge dataset (~2M cast_info rows):

Query	Source relation	MV read	Speedup
`gender_pairing_stats`	~7.4s	~0.3ms	~21,000×
`company_movie_cross`	~7.4s	~0.4ms	~20,000×
`person_movie_network`	~13.3s	~0.7ms	~20,000×
`cast_coappearance`	~19.7s	~0.4ms	~49,000×

Run locally:

bundle install
JOB_SCALE=xlarge bundle exec rake benchmark:setup   # ~few minutes
bundle exec rake benchmark:slow
bundle exec rake benchmark:verify_updates           # refresh-on-write proof

See benchmark/DATA.md for dataset scales and setup details.

When to use (and when not to)

Good fit:

Expensive read-mostly reporting queries on MySQL/MariaDB/SQLite
Dashboards and admin pages where sub-second reads matter
Infrequent or batched writes to underlying tables
Acceptable eventual consistency between write and background refresh

Poor fit:

Real-time, strongly consistent reads (use live queries or replicas)
Very frequent writes where full refresh cost exceeds query cost
Tiny queries where materialization overhead isn't worth it
Views where you cannot enumerate all depends_on tables

Comparison with native materialized views

Capability	PostgreSQL native	activerecord-materialized
Precomputed snapshot	✅	✅
Transparent reads	✅ (query rewrite or direct)	✅ (ActiveRecord model)
Refresh on dependency change	Manual / trigger / pg_cron	✅ automatic via `depends_on`
Background refresh	`REFRESH ... CONCURRENTLY`	✅ async / ActiveJob
Incremental refresh	Limited (IVM extensions)	✅ default partition-local IVM for `GROUP BY` views
Atomic swap during refresh	✅ CONCURRENTLY	✅ table rename
Database portability	PostgreSQL only	✅ any ActiveRecord adapter

Versioning

This gem follows Semantic Versioning. Given MAJOR.MINOR.PATCH:

MAJOR — incompatible public-API changes (DSL macros, configuration keys, the View query surface).
MINOR — backward-compatible features.
PATCH — backward-compatible bug fixes.

Until 1.0.0, the API may still change between minor releases; pin a version if you depend on it. Every change is recorded in CHANGELOG.md.

Development

git clone https://github.com/mavrukin/activerecord-materialized.git
cd activerecord-materialized
bin/setup            # bundle install + git hooks + Sorbet RBIs
bin/ci               # RuboCop, Sorbet, and the full test suite
bundle exec rake benchmark:setup
bundle exec rake benchmark

API documentation is published at rubydoc.info/gems/activerecord-materialized (generated from YARD doc comments, with types pulled from the Sorbet signatures via yard-sorbet). Build it locally with:

bundle exec yard doc      # generates HTML into doc/
bundle exec yard server   # browse at http://localhost:8808

Maintainers: see RELEASING.md for the gem publishing process.

Contributing

Bug reports and pull requests are welcome at github.com/mavrukin/activerecord-materialized.