lithos
A small embedded, ordered, crash-safe key-value store for Ruby — no external database.
lithos is a from-scratch storage engine: a log-structured merge (LSM) tree,
the same design behind LevelDB/RocksDB, written as a native Ruby extension with
zero external dependency. You get:
- Durable writes — every
put/deleteis appended to a CRC-checked write-ahead log andfsync'd before it returns (in the defaultsync:mode). - Crash recovery — on open, the WAL is replayed; a torn or corrupted tail is detected and discarded.
- Ordered keys — keys are stored sorted (unsigned-byte order), so you get
each,each_key, and rangescans for free. - Binary safe — keys and values are arbitrary byte strings (embedded NULs OK).
The niche it fills: every other ordered + crash-safe Ruby KV store is a binding
to an external C/C++ library (LMDB/LevelDB/RocksDB) that's a pain to build on a
native-MSVC Windows Ruby. lithos is self-contained and builds with
vcvars — no prebuilt database needed.
Requirements
- Windows with a native MSVC (mswin) Ruby. Not supported on MinGW/UCRT.
- Visual Studio 2017+ / Build Tools with the Desktop development with C++ workload.
Install
gem install lithos
Usage
require "lithos"
Lithos.open("data/mydb") do |db|
db["alpha"] = "one"
db.put("beta", "two")
db.put("\x00raw\xff", "binary ok") # arbitrary binary keys + values
db["alpha"] # => "one"
db["missing"] # => nil
db.key?("beta") # => true
db.fetch("nope", "default") # => "default" (or raises KeyError)
db.delete("beta") # => true (existed)
# ordered iteration (ascending unsigned-byte key order)
db.each { |k, v| puts "#{k}=#{v}" }
db.each_key { |k| puts k }
# ordered range scans
db.scan(gte: "a", lt: "m") { |k, v| ... } # gt/lt exclusive, gte/lte inclusive
db.range("a", "m") { |k, v| ... } # half-open [a, m)
db.scan(gte: "a").to_a # Enumerator without a block
db.size # live key count
db.flush # seal the memtable into an SSTable
db.compact # merge SSTables, dropping shadowed keys + tombstones
end # closed automatically (flush + fsync)
Open without a block when you want to manage the lifetime yourself:
db = Lithos.open("data/mydb", sync: true)
db["k"] = "v"
db.close
Data persists across reopens, and survives a crash (process kill, power loss at the OS-flush boundary): reopening replays the WAL and discards any partial tail.
How it works
put/delete ──> WAL (append + CRC + fsync) ──> memtable (sorted std::map)
│ (size threshold)
▼
SSTable (immutable, mmap'd:
sorted data + bloom + sparse index)
get ──> memtable ──> SSTables newest→oldest (bloom-filtered)
compact ──> k-way merge of all SSTables ──> one SSTable (drops tombstones)
The set of live SSTables + the active WAL is recorded in a MANIFEST that is
rewritten atomically (temp file + FlushFileBuffers + atomic rename), so the
on-disk catalog is never observed half-updated.
Notes
- Durability vs speed:
sync: true(default)fsyncs every write. For bulk loads,Lithos.open(path, sync: false)skips per-writefsync(call#flushor#closeto make data durable) — faster, but a crash can lose recent writes. - Single writer: a store takes an exclusive directory lock on open; it is not safe to share one store across threads (give each thread its own, or use a Mutex).
License
MIT.