DataFor
Queryable, read-only Ruby
Datamodels backed by Rails config files.
Country = Data.define(:id, :name) do
include DataFor::Model
config :countries
end
Country.find("US") #=> #<data Country id="US", name="United States">
Country.find!("ZZ") # raises DataFor::RecordNotFound
Country.find_by(name: "Canada") #=> #<data Country id="CA", name="Canada">
Country.where(name: "Canada") #=> [#<data Country id="CA", name="Canada">]
DataFor turns YAML in your Rails config directory into queryable, read-only
models — countries, currencies, plans, and other reference data that rarely
changes and doesn't belong in your database. It's built on Ruby's native
Data.define and Rails'
config_for, so the records are immutable value objects and the
storage is a YAML file you already know how to edit.
Why?
Not all of an application's data must live in the database. Reference data — countries, currencies, subscription tiers, feature flags, lookup tables — has a few properties that distinguish it from "real" data:
- It rarely changes (changes are deployments, not user actions).
- It is not user-editable.
- It needs to be versioned alongside the code that depends on it.
- It is small enough to live in memory.
Database vs. DataFor
| Database | DataFor |
|
|---|---|---|
| Writeable at runtime | yes | no |
| Schema migrations required | yes | no |
| Versioned with application code | no | yes |
| Joins | yes | no |
| Editable without a redeploy | yes | no |
| Cost per query | network | in-memory hash/array |
If your data is on the right-hand side of that table, DataFor is probably a
better fit than a table.
Installation
Add to your Gemfile:
gem "data_for"
Then:
bundle install
DataFor requires Ruby >= 3.4 (for the implicit it block parameter) and
Rails >= 6.1 (for config_for).
Usage
1. Put your data in config/
Rails' config_for reads config/<name>.yml files. The top-level key
shared provides defaults across environments; per-environment keys
(development, production, etc.) override them.
Example config/countries.yml:
shared:
- id: AU
name: Australia
states:
- { id: ACT, name: "Australian Capital Territory", country_id: AU }
- { id: NSW, name: "New South Wales", country_id: AU }
# ...
- id: CA
name: Canada
states:
- { id: AB, name: Alberta, country_id: CA }
# ...
- id: US
name: "United States"
states:
- { id: AL, name: Alabama, country_id: US }
# ...
2. Define a model
Use Ruby's Data.define to declare the value object's members, then
include DataFor::Model to make it queryable:
Country = Data.define(:id, :name, :states) do
include DataFor::Model
config :countries
end
3. Query
Country.find("AU")
# => #<data Country id="AU", name="Australia", states=[...]>
Country.find!("ZZ")
# raises DataFor::RecordNotFound
Country.find_by(name: "United States")
# => #<data Country id="US", name="United States", states=[...]>
Country.find_by!(name: "Atlantis")
# raises DataFor::RecordNotFound
Country.where(name: "Canada")
# => [#<data Country id="CA", ...>]
Country.all
# => [#<data Country id="AU", ...>, #<data Country id="CA", ...>, ...]
find uses a primary-key index and is O(1). find_by and where scan
linearly across the record set — fine for the size of data that belongs in
DataFor in the first place.
Custom primary keys
The default primary key is :id. Override it with self.primary_key= when
your data uses a different identifier:
Book = Data.define(:isbn, :title, :author) do
include DataFor::Model
config :books
self.primary_key = :isbn
end
Book.find("978-0-13-468599-1")
Casting attributes with cast_<member>
For every member of your Data class, DataFor::Model generates a private
cast_<member> method that runs at construction time. The default
implementation is the identity function. Override the cast method to coerce,
parse, or nest:
Country = Data.define(:id, :name, :states) do
include DataFor::Model
config :countries
private
def cast_states(data)
Array(data).map { State[**it] }
end
end
Now every Country#states returns an array of State value objects rather
than raw hashes:
Country.find("US").states.first
# => #<data State id="AL", name="Alabama", country_id="US">
Reprojecting one config to drive multiple models
config accepts a project: proc that transforms the loaded data before it
becomes the model's record set. This lets a single YAML file power multiple
query surfaces:
State = Data.define(:id, :name, :country_id) do
include DataFor::Model
config :countries, project: -> { it.pluck(:states).flatten }
end
State.where(country_id: "US")
# => [#<data State id="AL", ...>, #<data State id="AK", ...>, ...]
Country.find("US").states == State.where(country_id: "US")
# => true
Loading data outside of Rails
By default, config reads data via Rails.application.config_for(filename).
Pass a loader: proc to load data from somewhere else:
Plan = Data.define(:id, :name, :price_cents) do
include DataFor::Model
config :plans, loader: ->(name) { YAML.load_file("data/#{name}.yml") }
end
Caching & reloading
config reads the file once, applies the project: proc, freezes the
result, and caches it in a class instance variable. Subsequent queries hit
in-memory data structures — no file I/O, no parsing.
- In
production(withconfig.cache_classes = true), models load at boot and stay until the process restarts. - In
development, Rails reloads model classes on every request, which re-evaluates theData.defineblock and re-runsconfig. Edits to your YAML files take effect on the next request — no server restart. - In
test, behavior matchesproductionby default.
If you call config a second time on the same class (rare), the internal
@all and @index memos are cleared so the new data takes over cleanly.
Alternatives
DataFor is not the first gem in this space. The closest neighbors:
data_for (this gem)
- Storage: Rails
config_for(YAML) - Record type: Ruby
Data.define(immutable value object) - Distinct features: idiomatic Rails configuration, immutable records, and
the
project:proc reshapes one config file into multiple query surfaces.
frozen_record
- Storage: YAML / JSON / custom backends
- Record type: ActiveRecord-style class
- Distinct features: the most mature gem in this space, broadest query API,
pluggable backends. Records are AR-style classes, not Ruby
Data.
active_hash
- Storage: in-memory hashes (
active_hash), YAML (active_yaml), JSON, ENUM - Record type: ActiveRecord-style class
- Distinct features: AR-compatible API, supports associations to AR models. Long-established.
yaml_record
- Storage: YAML
- Record type: custom class
- Status: older, less active.
If you need AR-style associations across both real DB tables and your
reference data, active_hash is the best fit. If you want the broadest
query API and don't care about value-object semantics, frozen_record is
the most mature option. DataFor is the smallest gem of the bunch and the
only one built on native Ruby Data objects with Rails-idiomatic
configuration.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/OutlawAndy/data_for.
License
DataFor is available as open source under the terms of the
MIT License.