Fibrio
fibrio is a small Ruby gem for reading large JSON array, NDJSON, and CSV inputs one record at a time. It keeps Fiber usage inside Fibrio::Stream, so callers can use the normal Enumerable API.
Installation
gem "fibrio"
Then require it:
require "fibrio"
Usage
Fibrio.open("data.json", format: :json) do |stream|
stream.each do |record|
process(record)
end
end
each returns an Enumerator when no block is given, so lazy chains work as expected:
stream = Fibrio.open("data.json", format: :json)
top10 = stream.each.lazy.select { |record| record["active"] }.first(10)
stream.close
CSV with no header row returns arrays:
Fibrio.open("data.csv", format: :csv, headers: false) do |stream|
stream.each { |row| p row }
end
String input is accepted as data when it is not an existing file path:
Fibrio.open("[1,2,3]", format: :json) do |stream|
stream.each { |number| p number }
end
Top-level JSON objects can stream an array nested at a known path:
Fibrio.open('{"payload":{"records":[{"id":1},{"id":2}]}}', format: :json, path: %w[payload records]) do |stream|
stream.each { |record| p record["id"] }
end
NDJSON uses one JSON value per non-empty line:
Fibrio.open(%({"id":1}\n{"id":2}\n), format: :ndjson) do |stream|
stream.each { |record| p record["id"] }
end
Supported Formats
- JSON: top-level arrays, or object-contained arrays selected with
path:. - NDJSON: blank lines are skipped. Each non-empty line is parsed with Ruby's standard
jsonlibrary. - CSV:
headers: trueby default yields hashes.headers: falseyields arrays. Quoted newlines are supported.
Memory Benchmark
From a source checkout, run the benchmark with:
ruby benchmark/memory.rb 250000
The benchmark generates temporary files, reads them in a child process, and polls peak RSS from the parent process. Fibrio rows iterate through records without retaining them; eager rows keep the parsed collection in memory. Peak RSS includes the Ruby VM baseline, so absolute numbers vary by Ruby version and platform.
Example result on Ruby 4.0.0 arm64-darwin24 with 250,000 records:
| Format | Reader | Input MiB | Records | Seconds | Peak RSS MiB |
|---|---|---|---|---|---|
| JSON | Fibrio | 20.07 | 250,000 | 14.710 | 39.4 |
| JSON | JSON.parse(File.read) | 20.07 | 250,000 | 0.069 | 105.4 |
| NDJSON | Fibrio | 20.07 | 250,000 | 0.220 | 25.6 |
| NDJSON | File.readlines + JSON.parse | 20.07 | 250,000 | 0.182 | 127.6 |
| CSV | Fibrio | 9.10 | 250,000 | 2.640 | 33.3 |
| CSV | CSV.read(headers: true) | 9.10 | 250,000 | 0.826 | 192.8 |
The tradeoff is intentional: Fibrio prioritizes bounded memory use for large inputs over loading everything as fast as possible.
Known Limitations
- Each individual record must fit in memory.