rbxl
openpyxl inspired Ruby gem for large-ish .xlsx files.
Current scope is intentionally small:
write_onlyworkbook generationread_onlyrow streamingclose()for read-only workbooks- minimal
openpyxl-like API - optional C extension (
rbxl/native) for maximum performance
Out of scope for this MVP:
- preserving arbitrary workbook structure on save
- rich style round-tripping
- formulas, images, charts, comments
Usage
require "rbxl"
book = Rbxl.new(write_only: true)
sheet = book.add_sheet("Report")
sheet.append(["id", "name", "score"])
sheet.append([1, "alice", 100])
sheet.append([2, "bob", 95.5])
book.save("report.xlsx")
require "rbxl"
book = Rbxl.open("report.xlsx", read_only: true)
sheet = book.sheet("Report")
sheet.each_row do |row|
p row.values
end
p sheet.calculate_dimension
book.close
write_only workbooks are save-once by design. This matches the optimized
mode tradeoff: low flexibility in exchange for simpler memory behavior.
Native C Extension
Add a single require to opt-in to the libxml2-based C extension for
significantly faster read and write performance:
require "rbxl"
require "rbxl/native" # opt-in
# Same API, backed by C extension
book = Rbxl.open("large.xlsx", read_only: true)
book.sheet("Data").rows(values_only: true).each { |row| process(row) }
book.close
The C extension is opt-in by design:
- Portability first:
require "rbxl"alone works everywhere Ruby and Nokogiri run, with zero native compilation required. This is the default. - Performance when you need it:
require "rbxl/native"activates the libxml2 SAX2 backend for read/write hot paths. If the.sowas not built (e.g. libxml2 headers missing at install time), you get a clearLoadErrorrather than a silent degradation. - Same API, same output: switching between the two paths changes nothing about behavior or output format. The test suite runs both paths and compares results cell-by-cell to guarantee parity.
- Fallback is automatic at build time:
gem install rbxlattempts to compile the C extension. If libxml2 is not found, compilation is silently skipped and the gem installs successfully without it. You only notice when you tryrequire "rbxl/native". - Current boundary cost is explicit: worksheet ZIP entries are still inflated into a Ruby string before crossing into C. The extension removes XML parse overhead, but not ZIP I/O or that intermediate buffer.
Requirements for the C extension:
- libxml2 development headers (
libxml2-dev/libxml2-devel), or - Nokogiri with bundled libxml2 (headers are detected automatically)
Design Notes
- Writer avoids a full workbook object graph and streams rows into sheet XML.
- Reader uses a pull parser for worksheet XML so it can iterate rows without building the full DOM.
- Strings written by the MVP use
inlineStrto avoid shared string bookkeeping during generation. - Reader supports both shared strings and inline strings.
- The native extension uses libxml2 SAX2 directly, bypassing Nokogiri's per-node Ruby object allocation overhead.
Development
bundle install
cd benchmark && npm install && cd ..
# Run tests (pure Ruby)
ruby -Ilib -Itest test/rbxl_test.rb
# Run tests (with native extension)
cd ext/rbxl_native && ruby extconf.rb && make && cd ../..
ruby -Ilib -Itest -r rbxl/native test/rbxl_test.rb
ruby -Ilib -Itest test/fast_ext_test.rb
# Benchmarks
ruby -Ilib benchmark/compare.rb # pure Ruby
ruby -Ilib -r rbxl/native benchmark/compare.rb # with native
RBXL_BENCH_WARMUP=1 RBXL_BENCH_ITERATIONS=5 ruby -Ilib benchmark/read_modes.rb
Benchmarks
The performance story is primarily about rbxl/native.
require "rbxl" remains the portability-first default: no native extension is
required, the API stays the same, and the fallback path is still useful for
environments where native builds are inconvenient. But the numbers below are
best read as:
rbxl= portable baselinerbxl/native= performance mode
5000 rows x 10 columns, Ruby 3.4 / Python 3.13 / Node 24:

Portable Baseline (require "rbxl")
| benchmark | real (s) |
|---|---|
| rbxl write | 0.08 |
| rbxl read | 0.33 |
| rbxl read values | 0.23 |
| exceljs write | 0.08 |
| exceljs read | 0.17 |
| sheetjs write | 0.13 |
| sheetjs read | 0.19 |
| openpyxl write | 0.35 |
| openpyxl read | 0.22 |
| openpyxl read values | 0.18 |
Performance Mode (require "rbxl/native")
| benchmark | real (s) | vs exceljs/openpyxl |
|---|---|---|
| rbxl write | 0.04 | about 2x / 9x faster |
| rbxl read | 0.07 | about 2.6x / 3.2x faster |
| rbxl read values | 0.03 | about 6.8x faster than openpyxl values |
The comparison script uses these libraries when available:
Benchmark notes:
RBXL_BENCH_WARMUPandRBXL_BENCH_ITERATIONScontrol warmup and repeated runs.- Read comparisons use the same
rbxl.xlsxfixture forrbxl,roo,rubyXL, andopenpyxl. - JS comparisons use the same
rbxl.xlsxfixture forexceljsandsheetjs. - Write comparisons still measure each library producing its own workbook.
rss_delta_kbis best-effort process RSS on Linux and should be treated as directional.Install JS benchmark dependencies with
cd benchmark && npm install.rbxlfor write/readexceljsfor write/readsheetjsfor write/readexcelize(Go) for write/readrust_xlsxwriter(Rust) for writecalamine(Rust) for readrubyXLfor full workbook readopenpyxlas a Python reference point whenopenpyxloruvis available