activerecord-trino-adapter
A read-only ActiveRecord SQL adapter for Trino, built on top of the trino-client gem.
Lets a Rails application query a Trino data warehouse using familiar ActiveRecord scopes and where chains while preventing accidental writes. Designed for analytical use cases where the warehouse is the source of truth and the application only needs to read from it.
Features
- Read-only by design. All write paths (
insert,update,delete, transactions, migrations, schema changes) raiseActiveRecord::Trino::ReadOnlyError. - ActiveRecord-native. Plugs into Rails 7.1+ multi-database via
database.ymlandconnects_to. - Opinionated safety belts.
find_each/find_in_batchesare banned (they don't fit Trino's pagination model); hard query timeouts default to 150 seconds; slow queries emitActiveSupport::Notificationsfor any subscriber to pick up. - SQL-injection conscious. Trino has no parameterized queries, so every literal flows through a tight, fuzz-tested
quoteimplementation. - Schema introspection via Trino's
information_schema.columns, with a small but practical type map (varchar, integer, decimal, boolean, date, timestamp, timestamp with time zone, json, etc.).
Installation
Add this line to your application's Gemfile:
gem "activerecord-trino-adapter"
Then in your config/database.yml:
warehouse:
adapter: trino
host: <%= ENV["TRINO_HOST"] %>
port: <%= ENV.fetch("TRINO_PORT", 8080) %>
ssl: <%= ENV.fetch("TRINO_SSL", "false") %>
user: <%= ENV["TRINO_USER"] %>
password: <%= ENV["TRINO_PASSWORD"] %>
catalog: <%= ENV["TRINO_CATALOG"] %>
schema: <%= ENV["TRINO_SCHEMA"] %>
query_timeout: 150
plan_timeout: 30
And an abstract record that connects to it:
class WarehouseRecord < ActiveRecord::Base
self.abstract_class = true
connects_to database: { reading: :warehouse }
end
class SalesByDay < WarehouseRecord
self.table_name = "sales_by_day"
end
Now standard AR queries work against Trino:
SalesByDay.where(month: "2026-04").order(:territory_id).limit(100)
What is and is not supported
Supported:
where,order,limit,offset,select,pluck,find_by,count,sum,average- Scopes, including chained scopes
- Type-cast reads for varchar, integer (all widths), real/double, decimal, boolean, date, timestamp, timestamp with time zone, time, json, uuid
Not supported (raises):
- Any write path:
save,update,delete,destroy,insert,create_table, transactions, savepoints find_each/find_in_batches— Trino's pagination model is incompatible; use explicitLIMIT/OFFSETorpluckaggregates- Trino composite types (
array,map,row) in result casting — select scalar columns or extract via Trino SQL (element_at, dot access)
Out of design scope:
joinsare not a design goal. The adapter passes SQL through to Trino, so a join across two Trino-backed models technically works, but it is not tested and the intended usage pattern is to query flat denormalized warehouse tables.- Cross-database joins (e.g., a MySQL model joined to a Trino model) do not work — Rails 7.1+ disallows joins across connection handles.
Configuration options
All keys are read from the database.yml entry:
| Key | Default | Description |
|---|---|---|
host |
required | Trino server hostname (e.g. trino.example.com) — no scheme, no port |
port |
8080 (HTTP) / 443 (HTTPS) |
Trino server port |
ssl |
false |
Whether to use HTTPS. When true, also passing password requires the connection to be HTTPS (trino-client policy) |
user |
required | Trino user |
password |
nil | Optional basic-auth password |
catalog |
required | Default Trino catalog |
schema |
required | Default Trino schema |
query_timeout |
150 |
Hard ceiling on query duration, in seconds. Cap lower for user-facing paths and higher for backfills |
plan_timeout |
30 |
Ceiling on Trino query-planning phase, in seconds |
slow_query_threshold_seconds |
20 |
Threshold above which an active_record_trino.slow_query notification is emitted |
persistent |
false |
Reuse one keep-alive HTTP connection per adapter instance instead of opening a fresh TCP+TLS connection for every request. See Persistent HTTP connections |
gzip |
nil | When true, requests gzip-compressed HTTP response bodies from Trino |
bulk_column_reflection |
false |
Reflect every table's columns in a single information_schema.columns query instead of one per table. See Bulk column reflection |
static_schema |
false |
Serve table existence from columns declared via ActiveRecord::Trino.define_columns instead of running SHOW TABLES. See Static schema declarations |
Persistent HTTP connections
A single Trino query is 4-6 HTTP requests (POST /v1/statement, then repeated
nextUri polls), and by default each one pays a full TCP + TLS handshake. With
persistent: true the adapter keeps one keep-alive connection per adapter
instance (Rails checks out one adapter instance per thread, so no locking is
involved) and reuses it across requests and queries. Against a TLS-fronted
cluster this typically saves several hundred milliseconds per query.
Idle keep-alive sockets are recycled after 100 seconds, below typical
load-balancer idle timeouts, and Trino protocol GET polls are retried
transparently if the server closes a kept-alive socket. disconnect! shuts the
pool down; reconnect! rebuilds it.
Bulk column reflection
By default the adapter reflects a model's columns with a single-table
information_schema.columns query the first time ActiveRecord loads its schema.
With bulk_column_reflection: true, the first reflection instead issues one
information_schema.columns query for the whole catalog/schema and groups the
result by table, memoized per connection. Since each Trino query carries a few
hundred milliseconds of fixed overhead, reflecting N tables this way costs one
round trip instead of N — useful when a process touches many warehouse tables.
The cache is cleared on reconnect! and by ActiveRecord::Trino.reset_schema_cache!.
Static schema declarations
If your team owns the warehouse schema, you can declare a table's columns in
code and skip reflection entirely. Register columns with
ActiveRecord::Trino.define_columns:
ActiveRecord::Trino.define_columns("sales_by_day", [
{ name: :territory_id, sql_type: "integer", null: false },
{ name: :month, sql_type: "date" },
{ name: :amount, sql_type: "decimal(18, 2)" }, # null defaults to true
])
When a table is registered, #columns serves the declared definitions and never
queries information_schema. Use the Trino SQL type strings (bigint,
integer, date, decimal(18, 2), varchar, timestamp(3), boolean, …) —
the same values information_schema would return — so type casting is identical.
Declaration is per table: undeclared tables still reflect.
Setting static_schema: true additionally serves table existence
(#data_sources) from the registered tables, so SHOW TABLES is never run. With
this enabled, every table the app queries must be declared, or ActiveRecord will
treat it as nonexistent.
Declarations become the source of truth: a column added, dropped, or retyped in the warehouse is not picked up until you update the declaration and redeploy (unlike reflection, which self-corrects on reconnect). Best suited to schemas your team controls and changes deliberately.
Instrumentation
The adapter uses ActiveRecord's standard AbstractAdapter#log for query instrumentation, so any ActiveSupport::Notifications subscriber on sql.active_record picks up Trino queries automatically.
In addition, queries exceeding slow_query_threshold_seconds emit an active_record_trino.slow_query notification with payload { sql:, duration:, query_id:, info_uri: }. The info_uri deep-links to the query's stats page in the Trino web UI, which is handy for diagnosing slow paths.
Diagnostics
For one-off latency investigation, ActiveRecord::Trino::Diagnostics.profile(model_class) runs a sample query and reports where the time went, broken down between the Ruby/AR side and Trino's own per-query stats:
ActiveRecord::Trino::Diagnostics.profile(SalesByDay)
# => {
# schema_time: 0.45, # Ruby-side seconds for information_schema.columns
# query_time: 1.12, # Ruby-side seconds for the sample SELECT
# query_id: "20260520_...", # Trino query_id (deep-link via info_uri)
# info_uri: "https://...", # URL to the query's stats page
# queued_time_ms: 50, # Trino-side: queued waiting for resources
# elapsed_time_ms: 800, # Trino-side: total wall clock
# cpu_time_ms: 200, # Trino-side: CPU time spent
# wall_time_ms: 750, # Trino-side: execution wall time
# state: "FINISHED"
# }
The same metadata is available on the connection after any query:
SalesByDay.first
connection = SalesByDay.connection
connection.last_query_id # Trino query_id of the most recent query
connection.last_query_info_uri # Direct URL to the Trino UI for that query
connection.last_query_stats # Hash of state, queued_time_millis, elapsed_time_millis, etc.
Development
- Clone the repository.
- Install dependencies:
bundle install. - Run the test suite:
bundle exec rspec. - Run the linter:
bundle exec rubocop.
Testing across Rails versions
Use Appraisal to run the suite against each supported Rails minor:
bundle exec appraisal install
bundle exec appraisal rspec
Testing changes locally in another app
# In the consuming application's Gemfile
gem "activerecord-trino-adapter", path: "path/to/activerecord-trino-adapter"
License
This project is licensed under the MIT License — see the LICENSE.txt file for details.