rbxl
Fast, memory-friendly Ruby gem for row-by-row .xlsx reads, surgical edits
of existing .xlsx files, and append-only writes.
rbxl is built for the three workbook workflows that scale cleanly:
- read-only row-by-row iteration
- read-modify-save surgical edits ("template fill-in") that round-trip every untouched part byte-for-byte
- write-only workbook generation
The API is intentionally small and openpyxl-inspired, with an optional
native extension for faster XML parsing when you need more throughput.
Supported:
- write-only workbook generation
- read-only row-by-row iteration
- read-modify-save surgical edits via
Rbxl.open(path, edit: true)— byte-for-byte preservation of untouched parts (styles, drawings, charts, comments, pivot caches, custom XML, untouched sheets) - opt-in date/time conversion driven by the workbook's
numFmtstyles - optional C extension (
rbxl/native) for maximum performance
Out of scope:
- bulk rewrites of a worksheet's data area in edit mode — use the
write-only mode (
Rbxl.new) for that. Edit mode is the right tool for template fill-ins (a handful of cells in a templated workbook); the touched sheet is loaded into a Nokogiri DOM, so memory scales with that sheet's on-disk size - inserting / deleting / reordering / duplicating sheets
- editing styles, formulas, named ranges, drawings, or shared strings
Date/Time/DateTimewrites via edit mode (raiseRbxl::EditableCellTypeError); convert to a numeric Excel serial yourself if you need a date cell- legacy
.xls(BIFF/CFB) input — rbxl reads OOXML.xlsxonly. Convert first, e.g.libreoffice --headless --convert-to xlsx file.xlsorssconvert file.xls file.xlsx(Gnumeric).Rbxl.opendetects the OLE compound-file magic on open and raisesRbxl::UnsupportedFormatErrorwith the conversion hint rather than surfacing an opaque ZIP parse error from rubyzip. - preserving arbitrary workbook structure on write-only save (edit mode preserves every untouched part)
- rich style round-tripping when generating new workbooks
- formulas, images, charts, comments — readable in edit mode, but not introspected or edited
Usage
Rbxl.open defaults to read-only, Rbxl.open(path, edit: true) opens an
existing workbook for surgical edits, and Rbxl.new defaults to
write-only. The mode is selected by the wrapper at the module level so the
call site doesn't have to juggle backend classes.
Writing a new workbook
require "rbxl"
book = Rbxl.new
sheet = book.add_sheet("Report")
sheet.append(["id", "name", "score"])
sheet.append([1, "alice", 100])
sheet.append([2, "bob", 95.5])
book.save("report.xlsx")
Write-only workbooks follow three rules:
- Append-only within a sheet.
sheet.append(row)is the only way to add data. There is no random-access cell write, no mid-stream edit of a previously appended row. - Save-once per workbook.
saveflushes the full.xlsxpackage in a single pass and then closes the workbook. Callingsaveoradd_sheetagain raisesRbxl::WorkbookAlreadySavedError. To produce another file, start a newRbxl.new. - No read-modify-save. rbxl cannot open an existing
.xlsxand write back to it (see Out of scope above).
This is the tradeoff that keeps memory flat: rbxl buffers rows per sheet and never materializes a full workbook object graph.
Reading a workbook
require "rbxl"
book = Rbxl.open("report.xlsx")
sheet = book.sheet("Report")
sheet.each_row do |row|
p row.values
end
p sheet.calculate_dimension
book.close
Reading recipes
Plain value arrays (fastest path). Use values_only: true when you
only care about the cell values, not their coordinates. Rows come back as
frozen Array<Object>:
book.sheet("Data").each_row(values_only: true) do |values|
id, name, score = values
# ...
end
Cell objects with coordinates. Default each_row yields a
Rbxl::Row wrapping Rbxl::ReadOnlyCells. Use this when you need the
Excel coordinate alongside the value:
book.sheet("Data").each_row do |row|
row.index # => 2 (1-based worksheet row number)
row[0].coordinate # => "A2"
row[0].value # => "alice"
row.values # => ["alice", 100, true]
end
Skip the header row. each_row without a block returns an
Enumerator, so chain drop:
book.sheet("Data").each_row(values_only: true).drop(1).each do |row|
# ...
end
Peek at the first N rows. rows(...) is an enumerator-returning
alias that composes well with take, first, lazy, etc.:
book.sheet("Data").rows(values_only: true).first(5)
Know the data range up-front. When the workbook has a stored
dimension, these are O(1) lookups; otherwise pass force: true to scan:
sheet = book.sheet("Data")
sheet.max_row # => 500
sheet.max_column # => 12
sheet.calculate_dimension # => "A1:L500"
Pad sparse rows to the sheet width. Without pad_cells, a row
containing only A1 and C1 yields two cells. With pad_cells: true,
missing cells are filled with Rbxl::EmptyCell (or nil in values-only
mode), aligned to max_column:
book.sheet("Sparse").each_row(pad_cells: true, values_only: true).first
# => ["left", nil, "right"]
Leading empty columns aren't padded. Both default and pad_cells: true
rows align to the first populated column, not to column A. On a sheet
whose dimension is B1:N100, every row has 13 entries (columns B–N), not
max_columnstill reports14(column N, 1-based) — the gap is on the left, not the right. If you need column-A alignment, inspectcalculate_dimensionand prepend the missingnils yourself:
sheet = book.sheet("LeftOffset")
sheet.calculate_dimension # => "B1:N100"
leading_pad = Array.new(1, nil) # B starts at column 2, so 1 nil
sheet.each_row(values_only: true, pad_cells: true) do |row|
aligned = leading_pad + row # => [nil, "first B-value", ...]
end
Expand merged cells. Excel leaves the anchor cell populated and the
rest of the merge range empty. Pass expand_merged: true to propagate
the anchor value across the full range; combine with pad_cells: true
when you want the result aligned to the sheet's width:
sheet = book.sheet("Merged")
sheet.rows(values_only: true).to_a
# => [["group", "solo"], ["tail"]]
sheet.rows(values_only: true, pad_cells: true, expand_merged: true).to_a
# => [["group", "group", "solo", nil],
# ["group", "group", "solo", "tail"]]
List sheets before opening any. Sheet XML is only read on first iteration; enumerating names is cheap:
book.sheet_names # => ["Summary", "Detail", "Raw"]
book.sheet("Detail").each_row(values_only: true) { |row| ... }
Locate a bad input. All rbxl exceptions inherit from Rbxl::Error
and the messages carry the workbook path and (where relevant) the sheet
name, XML entry, or cell coordinate. Rescue at the sheet level:
begin
book.sheet("Raw").each_row(values_only: true) { |row| ... }
rescue Rbxl::WorksheetFormatError, Rbxl::WorkbookFormatError => e
warn e.message # includes workbook path and sheet/entry
rescue Rbxl::CellValueError => e
warn e.message # includes workbook path, sheet, and coordinate
end
Rbxl::CellValueError is raised by the cell decoder when
date_conversion: true is active. The reader is forward-only, so rescue
terminates iteration rather than skipping to the next row.
Date / time conversion
Numeric cells in .xlsx files are serial days since 1899-12-31; whether
they display as 44562, 2022-01-01, or 12:00 depends on the cell's
numFmt style. rbxl leaves cells as raw Float by default so the read
path stays allocation-light. Pass date_conversion: true to opt into
interpreting the style:
require "rbxl"
book = Rbxl.open("schedule.xlsx", date_conversion: true)
book.sheet("Timeline").each_row(values_only: true) do |row|
row.each { |v| p v } # => Date / Time / Float / String / ...
end
book.close
With the flag on, rbxl parses xl/styles.xml once at first use and
converts numeric cells whose style maps to a built-in date numFmtId
(14–22, 27–36, 45–47, 50–58) or to a custom formatCode containing date
tokens. Whole-number serials return Date; fractional serials return
Time so the time-of-day portion is preserved. The flag is off by
default; leaving it off skips the styles parse entirely and keeps the
native fast path in use. Turning it on routes reads through the pure-Ruby
worksheet parser.
Editing an existing workbook
Open a workbook in edit mode to surgically replace cell values without rebuilding the file from scratch. The classic use case is template fill-in: open a stylized template, write a handful of named cells, save back. Every part you don't touch — styles, drawings, charts, comments, pivot caches, custom XML, untouched worksheets — round-trips byte-for-byte straight from the source ZIP, so unknown OOXML extensions and PowerPoint-style add-ins survive the save.
require "rbxl"
Rbxl.open("template.xlsx", edit: true) do |book|
sheet = book.sheet("Invoice")
sheet["B2"].value = "Acme Inc."
sheet["B3"].value = Date.today.strftime("%Y-%m-%d") # Strings are fine
sheet["E10"].value = 1_250.0 # Numbers are fine
book.save("invoice-acme.xlsx")
end
book.save with no argument overwrites the original file via temp file
plus atomic rename, so a crash mid-save never produces a half-written
workbook:
Rbxl.open("template.xlsx", edit: true) do |book|
book.sheet(0)["A1"].value = "Q3 results"
book.save # in-place, atomic
end
Inside an edited worksheet, only the targeted <c> element is rewritten;
sibling cells, the row's other attributes, <mergeCells>,
<conditionalFormatting>, <dataValidations>, and any unknown OOXML
extensions remain in place. The cell's s (style index) attribute is
preserved when you overwrite an existing cell, so template formatting
(number format, font, fill, alignment) carries through. New cells (and
their enclosing rows) are inserted in column- and row-sorted positions.
EditableCell#value= accepts:
| Ruby | XLSX representation |
|---|---|
nil |
empty cell (preserves s style) |
String |
t="inlineStr" with <is><t/> |
Integer, Float |
<v> numeric (no t attribute) |
true, false |
t="b" with <v>1</v>/<v>0</v> |
Date, Time |
raises Rbxl::EditableCellTypeError |
Strings always round-trip as inline strings — xl/sharedStrings.xml is
never mutated, so the SST entries that the cells you didn't touch still
reference stay byte-identical, and the touched cells get their text
inlined. The trade-off is that overwriting a previously-shared-string
cell leaves an orphaned SST entry; for template fill-ins that's
negligible, and it's the simplest design that guarantees deterministic
output.
Date/Time/DateTime writes raise Rbxl::EditableCellTypeError in
1.4.0 — the cell's numFmt style would also have to be the right
date-pattern style for Excel to render the value, and silently picking
one is the kind of magic this design promise is built to avoid. Convert
to an Excel serial yourself if you need a date cell, and rely on the
template's existing date-formatted style index to render it.
Out of scope for edit mode
- inserting / deleting / reordering / duplicating sheets
- editing styles, formulas, named ranges, drawings, or shared strings
- recomputing the worksheet
<dimension>when a write expands the bounds (Excel recomputes on open; openpyxl-style normalization may arrive in a later release) - bulk rewrites of a worksheet's data area — touched sheets are loaded
into a Nokogiri DOM, so memory scales with that sheet's size on disk.
For data-area rewrites, use
Rbxl.newinstead.
Native C Extension
Add a single require to opt-in to the libxml2-based C extension for
significantly faster read and write performance:
require "rbxl"
require "rbxl/native" # opt-in
# Same API, backed by C extension
book = Rbxl.open("large.xlsx", read_only: true)
book.sheet("Data").rows(values_only: true).each { |row| process(row) }
book.close
For large worksheets where peak memory matters more than squeezing out the last few percent of throughput, opt into chunk-fed worksheet inflation:
require "rbxl"
require "rbxl/native"
Rbxl.max_worksheet_bytes = 64 * 1024 * 1024
book = Rbxl.open("large.xlsx", read_only: true, streaming: true)
book.sheet("Data").rows(values_only: true).each { |row| process(row) }
book.close
The C extension is opt-in by design:
- Portability first:
require "rbxl"alone works everywhere Ruby and Nokogiri run, with zero native compilation required. This is the default. - Performance when you need it:
require "rbxl/native"activates the libxml2 SAX2 backend for read/write hot paths. If the.sowas not built (e.g. libxml2 headers missing at install time), you get a clearLoadErrorrather than a silent degradation. - Same API, same output: switching between the two paths changes nothing about behavior or output format. The test suite runs both paths and compares results cell-by-cell to guarantee parity.
- Fallback is automatic at build time:
gem install rbxlattempts to compile the C extension. If libxml2 is not found, compilation is silently skipped and the gem installs successfully without it. You only notice when you tryrequire "rbxl/native". - Default path buffers the worksheet: the worksheet ZIP entry is inflated into a Ruby string before crossing into C. The extension removes XML parse overhead, but not ZIP I/O or that intermediate buffer.
- Opt-in streaming: passing
streaming: truetoRbxl.openfeeds the worksheet XML to the native parser in 64 KiB chunks pulled from the ZIP input stream, so peak memory stays roughly independent of sheet size. Pair withRbxl.max_worksheet_bytesto cap uncompressed worksheet inflation and stop high-compression zip-bomb style entries mid-inflate. Throughput is usually within a few percent of the default path. Withoutrequire "rbxl/native", the flag is accepted but the pure-Ruby reader still takes the buffered path.
Requirements for the C extension:
- libxml2 development headers (
libxml2-dev/libxml2-devel), or - Nokogiri with bundled libxml2 (headers are detected automatically)
Design Notes
- Writer avoids a full workbook object graph; rows are buffered per sheet and the XML is emitted in a single pass at
save. - Reader uses a pull parser for worksheet XML so it can iterate rows without building the full DOM.
- Strings written by the MVP use
inlineStrto avoid shared string bookkeeping during generation. - Reader supports both shared strings and inline strings.
- The native extension uses libxml2 SAX2 directly, bypassing Nokogiri's per-node Ruby object allocation overhead.
Development
Development in this repository assumes Ruby 3.4.8 (.ruby-version).
bundle install
cd benchmark && npm install && cd ..
# Run tests (pure Ruby)
bundle exec ruby -Ilib -Itest test/rbxl_test.rb
# Run tests (with native extension)
cd ext/rbxl_native && ruby extconf.rb && make && cd ../..
bundle exec ruby -Ilib -Itest -r rbxl/native test/rbxl_test.rb
bundle exec ruby -Ilib -Itest test/fast_ext_test.rb
# Benchmarks
bundle exec ruby -Ilib benchmark/compare.rb # pure Ruby
bundle exec ruby -Ilib -r rbxl/native benchmark/compare.rb # with native
RBXL_BENCH_WARMUP=1 RBXL_BENCH_ITERATIONS=5 bundle exec ruby -Ilib benchmark/read_modes.rb
# Generate API docs
bundle exec rake rdoc
Benchmarks
The performance story is primarily about rbxl/native.
require "rbxl" remains the portability-first default: no native extension is
required, the API stays the same, and the fallback path is still useful for
environments where native builds are inconvenient. But the numbers below are
best read as:
rbxl= portable baselinerbxl/native= performance mode
5000 rows x 10 columns, Ruby 3.4 / Python 3.13 / Node 24:

Portable Baseline (require "rbxl")
| benchmark | real (s) |
|---|---|
| rbxl write | 0.08 |
| rbxl read | 0.29 |
| rbxl read values | 0.22 |
| fast_excel write | 0.18 |
| fast_excel write constant | 0.12 |
| exceljs write | 0.08 |
| exceljs read | 0.19 |
| sheetjs write | 0.13 |
| sheetjs read | 0.20 |
| openpyxl write | 0.36 |
| openpyxl read | 0.21 |
| openpyxl read values | 0.18 |
| excelize write | 0.15 |
| excelize read | 0.14 |
Performance Mode (require "rbxl/native")
| benchmark | real (s) | vs exceljs/openpyxl |
|---|---|---|
| rbxl write | 0.05 | about 1.8x faster than exceljs, 2.5x faster than fast_excel constant, 7.7x faster than openpyxl |
| rbxl read | 0.09 | about 2.3x faster than exceljs, 2.4x faster than openpyxl |
| rbxl read values | 0.04 | about 4.8x faster than openpyxl values |
The comparison script uses these libraries when available:
Benchmark notes:
RBXL_BENCH_WARMUPandRBXL_BENCH_ITERATIONScontrol warmup and repeated runs.- Read comparisons use the same
rbxl.xlsxfixture forrbxl,roo,rubyXL, andopenpyxl. fast_exceladds write-only comparisons for both its default mode andconstant_memory: true.- JS comparisons use the same
rbxl.xlsxfixture forexceljsandsheetjs. - Write comparisons still measure each library producing its own workbook.
rss_delta_kbis best-effort process RSS on Linux and should be treated as directional.Install JS benchmark dependencies with
cd benchmark && npm install.rbxlfor write/readfast_excelfor write / constant-memory writeexceljsfor write/readsheetjsfor write/readexcelize(Go) for write/readrust_xlsxwriter(Rust) for writecalamine(Rust) for readrubyXLfor full workbook readopenpyxlas a Python reference point whenopenpyxloruvis available