win_excel_db

A Ruby gem that treats Excel workbooks as a relational database via win32ole. Provides an ORM-like layer with buffered writes, left/inner joins, and selections — all backed by .xlsx/.xlsb files on Windows.

Why Excel as a database?

This is not the idea that Excel is better or faster than a real database or SQLite. It isn't.

The idea is that in office environments, colleagues who need to contribute data often only understand Excel. win_excel_db lets you start cleanly in an Excel framework — colleagues fill in and maintain the sheets they know, while you lay complex Ruby logic on top, far more easily and efficiently than Excel macros ever allow.

Once the colleagues have done their part, you process everything in Ruby. All data and results stay in Excel, readable and editable by anyone — even when the Ruby scripts aren't running. Ruby just picks up whatever it finds.

This project grew out of planning a training event for 25 colleagues. What started as a manual Excel file kept growing — more courses, more people to manage — until it gradually scaled into full automation, one step at a time.

Requirements

  • Windows (win32ole is Windows-only)
  • Ruby with win32ole (included in standard Windows Ruby distribution)
  • Microsoft Excel installed

Excel file structure

Your workbooks must follow this layout for win_excel_db to read them correctly:

  • Row 1: column headers (unique names per sheet)
  • Row 2 onwards: uninterrupted data rows — no empty rows between data
  • Multiple sheets in one workbook are supported; each sheet is an independent table

Example:

EventID Name Year
E001 Workshop 2024
E002 Conference 2024

Installation

Add to your Gemfile:

gem 'win_excel_db'

Or install directly:

gem install win_excel_db

Source & Issues

Quick start

require 'win_excel_db'

db = WinExcelDb::Handler.new('C:\data\myfile.xlsx')
db.interactive(false)   # run Excel in background

sheet = db.get_row_sheet('Events')
range = sheet.meta_range

range.each do |row|
  puts row.read('Events.EventID')
end

db.close

Core classes

Class Purpose
WinExcelDb::Handler Opens/creates workbooks, manages the Excel COM process
WinExcelDb::RowSheet One worksheet as a queryable table
WinExcelDb::DataRow One data row with lazy-load value cache
WinExcelDb::MetaRow Combines DataRows from multiple sheets; field access via "Sheet.Field"
WinExcelDb::MetaRange Collection of MetaRows with join/filter/sort operations
WinExcelDb::Run Per-handler readonly/writing mode switch

Writing data

Create a new sheet with headers

sheet = db.create_row_sheet('Products', ['ID', 'Name', 'Price'])
row = sheet.create_meta_row
row.write('Products.ID',    'P001')
row.write('Products.Name',  'Apple')
row.write('Products.Price', 0.99)
db.save_as('C:\data\output.xlsx')
db.close

Cell-by-cell vs. buffered writes

Cell-by-cell issues one Excel COM call per write — fine for small changes:

row.write('Products.Name', 'Banana')

For bulk writes, buffered mode accumulates changes in memory and flushes everything in a single Excel range operation — significantly faster:

sheet.buffered_mode = true
rows.each_with_index do |row, i|
  row.write('Products.ID',    "P#{i}")
  row.write('Products.Name',  names[i])
  row.write('Products.Price', prices[i])
end
sheet.commit_buffer
sheet.buffered_mode = false

Querying

Field references always use the format "SheetName.FieldName".

Filter (selection)

# exact match
range.selection!('Events.Year', '2024')

# regex
range.selection!('Events.EventID', /^E0\d{2}$/)

# block
range.selection! { |row| row.read('Events.Year').to_i > 2022 }

# non-destructive variant (returns new MetaRange, original unchanged)
filtered = range.selection('Events.Year', '2024')

Joins

events   = db.get_row_sheet('Events').meta_range
programs = db.get_row_sheet('Programs').meta_range

# left join — 1 key (all events kept, unmatched get empty strings for Program fields)
events.leftjoin!('Events.ProgramID', programs, 'Programs.ProgramID')

# left join — 2 keys (composite key)
events.leftjoin!('Events.UserID', 'Events.JourneyID',
                 fellows,
                 'Fellows.UserID', 'Fellows.JourneyID')

# inner join — only matched rows are kept
events.innerjoin!('Events.ProgramID', programs, 'Programs.ProgramID')

# non-destructive variants
new_range = events.leftjoin('Events.ProgramID', programs, 'Programs.ProgramID')

After a join, fields from all joined sheets are accessible on the same MetaRow:

events.each do |row|
  puts row.read('Events.EventID')
  puts row.read('Programs.ProgramName')
end

Sort

range.sort_by! { |row| row.read('Events.StartDate') }

Excel instance management

Run in background

db = WinExcelDb::Handler.new(path)
db.interactive(false)   # hides window, disables screen updates

Always close cleanly

Each Handler instance starts its own Excel process. Without close, the process stays in memory:

db.close   # sets Saved=true, calls Workbook.Close and Excel.Quit

Kill orphaned instances

If a previous script crashed without calling close, orphaned Excel automation processes accumulate. Call this at the start of your script to clean up:

WinExcelDb::Handler.kill_orphaned_instances

This uses WMI to find and kill only Excel processes started with the /automation flag — it will not affect Excel windows opened by the user.

Deleting rows

# Safe pattern: collect rows to delete first, then delete outside the iteration.
# delete_row reindexes the underlying data_rows array immediately,
# so deleting while iterating is unsafe.

to_delete = []
range.each { |r| to_delete << r if r.read('Items.Keep') == 'no' }
to_delete.each { |r| r.delete_row('Items') }

Column limit

By default win_excel_db reads/writes up to WinExcelDb::MAX_COLUMNS (76) columns per row, corresponding to Excel columns A through BX. If your sheets are wider, change the constant before opening any workbooks:

WinExcelDb::MAX_COLUMNS = 100

Running the examples

cd examples
ruby 01_create_and_write.rb
ruby 02_joins.rb
ruby 03_commit_buffer.rb
ruby 04_delete_rows.rb
ruby 05_instance_management.rb

Each example creates its own Excel files in examples/tmp/.

Running the tests

bundle exec rake test

The test suite uses stub objects and does not require Excel to be installed.

License

MIT