win_excel_db
A Ruby gem that treats Excel workbooks as a relational database via win32ole.
Provides an ORM-like layer with buffered writes, left/inner joins, and selections — all backed by .xlsx/.xlsb files on Windows.
Why Excel as a database?
This is not the idea that Excel is better or faster than a real database or SQLite. It isn't.
The idea is that in office environments, colleagues who need to contribute data often only understand Excel. win_excel_db lets you start cleanly in an Excel framework — colleagues fill in and maintain the sheets they know, while you lay complex Ruby logic on top, far more easily and efficiently than Excel macros ever allow.
Once the colleagues have done their part, you process everything in Ruby. All data and results stay in Excel, readable and editable by anyone — even when the Ruby scripts aren't running. Ruby just picks up whatever it finds.
This project grew out of planning a training event for 25 colleagues. What started as a manual Excel file kept growing — more courses, more people to manage — until it gradually scaled into full automation, one step at a time.
Requirements
- Windows (win32ole is Windows-only)
- Ruby with win32ole (included in standard Windows Ruby distribution)
- Microsoft Excel installed
Excel file structure
Your workbooks must follow this layout for win_excel_db to read them correctly:
- Row 1: column headers (unique names per sheet)
- Row 2 onwards: uninterrupted data rows — no empty rows between data
- Multiple sheets in one workbook are supported; each sheet is an independent table
Example:
| EventID | Name | Year |
|---|---|---|
| E001 | Workshop | 2024 |
| E002 | Conference | 2024 |
Installation
Add to your Gemfile:
gem 'win_excel_db'
Or install directly:
gem install win_excel_db
Source & Issues
- Source: https://github.com/bachmarc/win_exl_db
- Bugs & feature requests: https://github.com/bachmarc/win_exl_db/issues
Quick start
require 'win_excel_db'
db = WinExcelDb::Handler.new('C:\data\myfile.xlsx')
db.interactive(false) # run Excel in background
sheet = db.get_row_sheet('Events')
range = sheet.
range.each do |row|
puts row.read('Events.EventID')
end
db.close
Core classes
| Class | Purpose |
|---|---|
WinExcelDb::Handler |
Opens/creates workbooks, manages the Excel COM process |
WinExcelDb::RowSheet |
One worksheet as a queryable table |
WinExcelDb::DataRow |
One data row with lazy-load value cache |
WinExcelDb::MetaRow |
Combines DataRows from multiple sheets; field access via "Sheet.Field" |
WinExcelDb::MetaRange |
Collection of MetaRows with join/filter/sort operations |
WinExcelDb::Run |
Per-handler readonly/writing mode switch |
Writing data
Create a new sheet with headers
sheet = db.create_row_sheet('Products', ['ID', 'Name', 'Price'])
row = sheet.
row.write('Products.ID', 'P001')
row.write('Products.Name', 'Apple')
row.write('Products.Price', 0.99)
db.save_as('C:\data\output.xlsx')
db.close
Cell-by-cell vs. buffered writes
Cell-by-cell issues one Excel COM call per write — fine for small changes:
row.write('Products.Name', 'Banana')
For bulk writes, buffered mode accumulates changes in memory and flushes everything in a single Excel range operation — significantly faster:
sheet.buffered_mode = true
rows.each_with_index do |row, i|
row.write('Products.ID', "P#{i}")
row.write('Products.Name', names[i])
row.write('Products.Price', prices[i])
end
sheet.commit_buffer
sheet.buffered_mode = false
Querying
Field references always use the format "SheetName.FieldName".
Filter (selection)
# exact match
range.selection!('Events.Year', '2024')
# regex
range.selection!('Events.EventID', /^E0\d{2}$/)
# block
range.selection! { |row| row.read('Events.Year').to_i > 2022 }
# non-destructive variant (returns new MetaRange, original unchanged)
filtered = range.selection('Events.Year', '2024')
Joins
events = db.get_row_sheet('Events').
programs = db.get_row_sheet('Programs').
# left join — 1 key (all events kept, unmatched get empty strings for Program fields)
events.leftjoin!('Events.ProgramID', programs, 'Programs.ProgramID')
# left join — 2 keys (composite key)
events.leftjoin!('Events.UserID', 'Events.JourneyID',
fellows,
'Fellows.UserID', 'Fellows.JourneyID')
# inner join — only matched rows are kept
events.innerjoin!('Events.ProgramID', programs, 'Programs.ProgramID')
# non-destructive variants
new_range = events.leftjoin('Events.ProgramID', programs, 'Programs.ProgramID')
After a join, fields from all joined sheets are accessible on the same MetaRow:
events.each do |row|
puts row.read('Events.EventID')
puts row.read('Programs.ProgramName')
end
Sort
range.sort_by! { |row| row.read('Events.StartDate') }
Excel instance management
Run in background
db = WinExcelDb::Handler.new(path)
db.interactive(false) # hides window, disables screen updates
Always close cleanly
Each Handler instance starts its own Excel process. Without close, the process stays in memory:
db.close # sets Saved=true, calls Workbook.Close and Excel.Quit
Kill orphaned instances
If a previous script crashed without calling close, orphaned Excel automation processes accumulate. Call this at the start of your script to clean up:
WinExcelDb::Handler.kill_orphaned_instances
This uses WMI to find and kill only Excel processes started with the /automation flag — it will not affect Excel windows opened by the user.
Deleting rows
# Safe pattern: collect rows to delete first, then delete outside the iteration.
# delete_row reindexes the underlying data_rows array immediately,
# so deleting while iterating is unsafe.
to_delete = []
range.each { |r| to_delete << r if r.read('Items.Keep') == 'no' }
to_delete.each { |r| r.delete_row('Items') }
Column limit
By default win_excel_db reads/writes up to WinExcelDb::MAX_COLUMNS (76) columns per row,
corresponding to Excel columns A through BX. If your sheets are wider, change the constant
before opening any workbooks:
WinExcelDb::MAX_COLUMNS = 100
Running the examples
cd examples
ruby 01_create_and_write.rb
ruby 02_joins.rb
ruby 03_commit_buffer.rb
ruby 04_delete_rows.rb
ruby 05_instance_management.rb
Each example creates its own Excel files in examples/tmp/.
Running the tests
bundle exec rake test
The test suite uses stub objects and does not require Excel to be installed.
License
MIT