ActiveReporter 
ActiveReporter is a framework for aggregating data about
Rails models backed by
PostgreSQL, MySQL, or
SQLite databases. It's designed to be flexible
enough to accommodate many use cases, but opinionated enough to avoid the need
for boilerplate.
Requires Ruby >= 3.3 and Rails 7.1–8.x. Upgrading from a pre-1.0 release? See Upgrading to v1.
ActiveReporter is based on the repor gem by Andrew Ross https://github.com/asross/repor
Contents
- Basic usage
- Building reports
- Defining reports
- Serializing reports
- Upgrading to v1
- Contributing
- License
Basic usage
Here are some examples of how to define, run, and serialize a ActiveReporter::Report:
class PostReport < ActiveReporter::Report
report_on :Post
category_dimension :author, relation: ->(r) { r.joins(:author) },
expression: 'users.name'
number_dimension :likes
time_dimension :created_at
count_aggregator :number_of_posts
sum_aggregator :total_likes, expression: 'posts.likes'
array_aggregator :post_ids, expression: 'posts.id'
end
# show me # published posts from 2014-2015 with at least 4 likes, by author
report = PostReport.new(
relation: Post.published,
groupers: [:author],
aggregators: [:number_of_posts],
dimensions: {
likes: {
only: { min: 4 }
},
created_at: {
only: { min: '2014', max: '2015' }
}
}
)
puts report.data
# => [
# { key: 'James Joyce', value: 10 },
# { key: 'Margaret Atwood', value: 4 }
# { key: 'Toni Morrison', value: 5 }
# ]
# show me likes on specific authors' posts by author and year, from 1985-1987
report = PostReport.new(
groupers: [:author, :created_at],
aggregators: [:total_likes],
dimensions: {
created_at: {
only: { min: '1985', max: '1987' },
bin_width: 'year'
},
author: {
only: ['Edith Wharton', 'James Baldwin']
}
}
)
puts report.data
# => [{
# key: { min: Tue, 01 Jan 1985 00:00:00 UTC +00:00,
# max: Wed, 01 Jan 1986 00:00:00 UTC +00:00 },
# values: [
# { key: 'Edith Wharton', value: 35 },
# { key: 'James Baldwin', value: 13 }
# ]
# }, {
# key: { min: Wed, 01 Jan 1986 00:00:00 UTC +00:00,
# max: Thu, 01 Jan 1987 00:00:00 UTC +00:00 },
# values: [
# { key: 'Edith Wharton', value: 0 },
# { key: 'James Baldwin', value: 0 }
# ]
# }, {
# key: { min: Thu, 01 Jan 1987 00:00:00 UTC +00:00,
# max: Fri, 01 Jan 1988 00:00:00 UTC +00:00 },
# values: [
# { key: 'Edith Wharton', value: 0 },
# { key: 'James Baldwin', value: 19 }
# ]
# }]
csv_serializer = ActiveReporter::Serializer::Csv.new(report)
puts csv_serializer.csv_text
# => csv text string
chart_serializer = ActiveReporter::Serializer::Highcharts.new(report)
puts chart_serializer.
# => highcharts options hash
To define a report, you declare dimensions (which represent attributes of your
data) and aggregators (which represent quantities you want to measure). To
run a report, you instantiate it with one aggregator and at least one dimension,
then inspect its data. You can also wrap it in a serializer to get results in
useful formats.
Building reports
Just call ReportClass.new(params), where params is a hash with these keys:
aggregators(required) is a list of the names of the aggregator(s) to aggregate bygroupers(required) is a list of the names of the dimension(s) to group byrelation(optional) provides an initial scope for the datadimensions(optional) holds dimension-specific filter or grouping options
See below for more details about dimension-specific parameters.
Defining reports
Base relation
A ActiveReporter::Report either needs to know what ActiveRecord class it is reporting
on, or it needs to know a table_name and a base_relation.
You can specify an ActiveRecord class by calling the report_on class method
with a class or class name, or if you prefer, you can override the other two as
instance methods.
By default, it will try to infer an ActiveRecord class from the report class
name by dropping /Report$/ and constantizing.
class PostReport < ActiveReporter::Report
end
PostReport.new.table_name
# => 'posts'
PostReport.new.base_relation
# => Post.all
class PostStructuralReport < ActiveReporter::Report
report_on :Post
def base_relation
super.where(author: 'Foucault')
end
end
PostStructuralReport.new.table_name
# => 'posts'
PostStructuralReport.new.base_relation
# => Post.where(author: 'Foucault')
Finally, you can also use autoreport_on if you'd like to automatically infer
dimensions from your columns and associations. autoreport_on will try to map
most columns to dimensions, and if the column in question is for a belongs_to
association, will even try to join and report on the association's name:
class PostReport < ActiveReporter::Report
autoreport_on Post
end
PostReport.new.dimensions.keys
# => %i[:created_at, :updated_at, :likes, :title, :author]
PostReport.new.dimensions[:author].expression
# => 'users.name'
autoreport_on maps each column to a dimension type: enum columns become
Enum dimensions, boolean columns become Boolean, date/time columns become
Time, numeric columns become Number, belongs_to associations become a
joined Category on the association's name column, and everything else becomes
Category.
Autoreport behavior can be customized by overriding certain methods; see the
ActiveReporter::Report code for more information.
Dimensions (x-axes)
You define dimensions on your ActiveReporter::Report to represent attributes of your
data you're interested in. Dimensions objects can filter or group your relation
by a SQL expression, and accept/return simple Ruby values of various types.
There are several built-in types of dimensions:
Category- Groups/filters the relation by the discrete values of the
expression
- Groups/filters the relation by the discrete values of the
Enum- Like
Category, but for Railsenumcolumns: filters/groups by the enum name while mapping to and from the underlying stored value
- Like
Boolean- Like
Category, but for boolean columns: casts filter values to real booleans and normalizes grouped values totrue/false/nilacross adapters
- Like
Number- Groups/filters the relation by binning a continuous numeric
expression
- Groups/filters the relation by binning a continuous numeric
Time- Like number dimensions, but the bins are increments of time
You define dimensions in your report class like this:
class PostReport < ActiveReporter::Report
category_dimension :status
number_dimension :author_rating, expression: 'users.rating',
relation: ->(r) { r.joins(:author) }
time_dimension :publication_date, expression: 'posts.published_at'
end
The SQL expression a dimension uses defaults to
"#{report.table_name}.#{dimension.name}". For a plain column, point it
elsewhere with attribute: (a different column), and model: or table_name:
(a different table):
category_dimension :author, attribute: :name, model: :user,
relation: ->(r) { r.joins(:author) }
For anything that isn't a simple table.column reference — SQL functions,
CASE expressions, arithmetic across columns — pass a raw expression:
instead (it is used verbatim):
category_dimension :author, expression: "users.name",
relation: ->(r) { r.joins(:author) }
Additionally, if the filtering or grouping requires joins or other SQL
operations, a custom relation proc can be passed, which will be called
beforehand.
Filtering by dimensions
All dimensions can be filtered to one or more values by passing in
params[:dimensions][<dimension name>][:only].
Category#only should be passed the exact values you'd like to filter
to (or what will map to them after connection adapter quoting).
Number and Time are "bin" dimensions, and their onlys
should be passed one or more bin ranges. Bin ranges should be hashes of at
least one of min and max, or they should just be nil to explicitly select
rows for which expression is null. Bin range filtering is min-inclusive but
max-exclusive. For Number, the bin values should be numbers or
strings of digits. For Time, the bin values should be dates/times or
Time.zone.parse-able strings.
Grouping by dimensions
To group by a dimension, pass its name to params[:groupers].
For bin dimensions (Number and Time), where the values
being grouped by are ranges of numbers or times, you can specify additional
options to control the width and distribution of those bins. In particular,
you can pass values to:
params[:dimensions][<name>][:bins],params[:dimensions][<name>][:bin_count], orparams[:dimensions][<name>][:bin_width]
bins is the most general option; you can use it to divide the full domain of
the data into non-uniform, overlapping, and even null bin ranges. It should be
passed an array of the same min/max hashes or nil used in filtering.
bin_count will divide the domain of the data into a fixed number of bins. It
should be passed a positive integer.
bin_width will tile the domain with bins of a fixed width. It should be
passed a positive number for Numbers and a "duration" for
Times. Durations can either be strings of a number followed by a time
increment (minutes, hours, days, weeks, months, years), or they can be hashes
suitable for use with
ActiveSupport::TimeWithZone#advance.
E.g.:
params[:dimensions][<time dimension>][:bin_width] = '1 month'
params[:dimensions][<time dimension>][:bin_width] = { months: 2, hours: 2 }
Numbers will default to using 10 bins and Times will
default to using a sensible increment of time given the domain; you can
customize this by overriding methods in those classes.
Note that when you inspect report.data after grouping by a bin dimension, you
will see the dimension values are actually ActiveReporter::Bin::Base objects,
which respond to min, max, and various json/Hash methods. These are meant
to provide a common interface for the different types of bins (double-bounded,
unbounded on one side, null) and handle mapping between SQL and Ruby
representations of their values. You may find bin objects useful in working
with report data, and they can also be customized.
If you want to change how ActiveReporter maps SQL values to the dimension values of
report.data, you can override YourDimension#sanitize_sql_value.
Customizing dimensions
You can define custom dimension classes by inheriting from one of the existing ones:
class CaseInsensitiveCategoryDimension < ActiveReporter::Dimension::Category
def order_expression
"UPPER(#{super})"
end
end
You can then use it in the definition of a report class like this:
class UserReport < ActiveReporter::Report
dimension :last_name, CaseInsensitiveCategoryDimension
end
Common methods to override include order_expression, sanitize_sql_value,
validate_params!, group_values, and default_bin_width.
Note that if you inherit directly from ActiveReporter::Dimension::Base, you
will need to implement (at a minimum) filter(relation), group(relation), and
group_values. See the base dimension class for more details.
If you want custom behavior for bins, you can define Set and Table
classes nested inside your custom dimension classes (or override methods
directly on ActiveReporter::Dimension::Bin::Set(Table),
ActiveReporter::Dimension::Time::Set(Table), etc). See the relevant classes for more
details.
Aggregators (y-axes)
Aggregators take your groups and reduce them down to a single value. They represent the quantities you're looking to measure across your dimensions.
There are several built-in types of aggregators:
Aggregator::Count- counts the number of distinct records in each group
Aggregator::Sum- sums an
expressionover each distinct record in each group
- sums an
Aggregator::Average- sum divided by count
Aggregator::Min- finds the minimum value of
expressionin each group
- finds the minimum value of
Aggregator::Max- finds the maximum value of
expressionin each group
- finds the maximum value of
Aggregator::Array- returns an array of
expressionvalues in each group (PostgreSQL only) - useful if you want to drill down into the data behind an aggregation
- returns an array of
Aggregator::CountIf- counts records matching a SQL condition; pass the condition as
expression:and the value(s) to match asvalue:/values:
- counts records matching a SQL condition; pass the condition as
Aggregator::Ratio- divides a
numeratoraggregate by adenominatoraggregate (PostgreSQL only)
- divides a
In addition to aggregators, reports support calculators (compute values
against a parent_report, e.g. a ratio of a row to its parent total),
trackers (compute values between sequential bin rows, e.g. a period-over-period
delta), and evaluators (block_evaluator runs an arbitrary Ruby block per
row). See the lib/active_reporter/{calculator,tracker,evaluator} classes for
details.
Customizing aggregators
By default, the expression will default to the aggregator name, but you can
achieve some level of customization by passing in expression or relation:
max_aggregator :max_likes, expression: 'posts.likes'
sum_aggregator :total_cost,
expression: 'invoices.hours_worked * invoices.hourly_rate'
avg_aggregator :mean_author_age, expression: 'AGE(users.dob)',
relation: ->(r) { r.joins(:author) }
You can also define your own aggregator type if none of the existing ones meet your needs:
class LengthAggregator < ActiveReporter::Aggregators::BaseAggregator
def aggregate(grouped_relation)
# check out the other aggregators for examples of what to do here.
end
end
# then:
aggregator :name_length, LengthAggregator, expression: 'posts.name'
Serializing reports
After defining and running a report, you can wrap it in a serializer to get its data in a more useful format.
Serializer::Table defines caption, headers, and each_row, which can be
used to construct a table. It also wraps dimension and aggregator names and
values in formatting methods, which can be overridden, e.g. if you would like to
use I18n for date or enum column formatting. You can override these methods on
Serializer::Base if you would like them to apply everywhere.
Serializer::Csv dumps the data from Serializer::Table to a CSV string or file.
Serializer::Highcharts can map reports with 1-3 grouping dimensions to options
for passing into the Highcharts charting library. Extra options included with
the raw data makes it easy to implement features like detailed tooltips and
drilldown.
Serializer::FormField represents report parameters as HTML form fields. Likely
you will want to implement your own form logic specific to your report class
and application design, but it provides an easy and somewhat extensible way to
get up and running.
See the serializer class files for more documentation.
Upgrading to v1
v1 modernizes the gem for current Ruby/Rails and fixes a number of latent bugs. See the CHANGELOG for the full list. The points below are the only ones that may require action.
Required changes
- Ruby >= 3.3 and Rails 7.1–8.x.
required_ruby_versionis now>= 3.3. - If you subclass a dimension/aggregator or read the registration hash, the
options accessor was renamed
opts→options:ruby # before dimension.opts report_model.dimensions[:author][:opts] # after dimension.options report_model.dimensions[:author][:options] - If you pin
deeply_enumerable, it is now~> 2.0(was< 2.0).
:expression, :attribute, :model, and :table_name are unchanged — existing
dimension/aggregator definitions keep working. (:attribute/:model/:table_name
are the convenient way to target a column; :expression remains the escape hatch
for raw/computed SQL.)
Behavior changes to be aware of
These need no code changes, but the output may differ:
autoreport_onnow mapsenumcolumns toDimension::Enumand boolean columns toDimension::Boolean(previouslyNumber/Category).Serializer::Highcharts#seriesnow emits one series per aggregator with correctly-populatedyvalues (previously a single series withy: 0.0). If you consume the series structure directly, re-check it. (#tooltip_foralso takes the aggregator as a second argument now.)Report#paramsnow strips blank values (empty strings, empty arrays/hashes), not justnils.Tracker::Valuenow returns the prior period's value (it was previously non-functional).
Everything else in the CHANGELOG's "Fixed" section is an internal bug fix that requires no action on your part.
Contributing
If you have suggestions for how to make any part of this library better, or if you want to contribute extra dimensions, aggregators, serializers, please submit them in a pull request (with test coverage).
To work on developing ActiveReporter, you will need Ruby (>= 3.3) and
PostgreSQL, MySQL, or SQLite3 installed. Then clone the repository and run:
bundle install
# run against one adapter (the test database is created and loaded automatically):
DB=<your db type> bundle exec rspec
# or run against all three adapters at once:
bundle exec rake spec:all
The options for DB are sqlite, mysql, and postgres (the default).
Preferably you should run against all three, as CI does.
To see the dummy application in development mode, you can run:
cd spec/dummy
DB=<your db type> bundle exec rake db:setup
DB=<your db type> bundle exec rails server