lithos

A small embedded, ordered, crash-safe key-value store for Ruby — no external database.

lithos is a from-scratch storage engine: a log-structured merge (LSM) tree, the same design behind LevelDB/RocksDB, written as a native Ruby extension with zero external dependency. You get:

  • Durable writes — every put/delete is appended to a CRC-checked write-ahead log and fsync'd before it returns (in the default sync: mode).
  • Crash recovery — on open, the WAL is replayed; a torn or corrupted tail is detected and discarded.
  • Ordered keys — keys are stored sorted (unsigned-byte order), so you get each, each_key, and range scans for free.
  • Binary safe — keys and values are arbitrary byte strings (embedded NULs OK).

The niche it fills: every other ordered + crash-safe Ruby KV store is a binding to an external C/C++ library (LMDB/LevelDB/RocksDB) that's a pain to build on a native-MSVC Windows Ruby. lithos is self-contained and builds with vcvars — no prebuilt database needed.

Requirements

  • Windows with a native MSVC (mswin) Ruby. Not supported on MinGW/UCRT.
  • Visual Studio 2017+ / Build Tools with the Desktop development with C++ workload.

Install

gem install lithos

Usage

require "lithos"

Lithos.open("data/mydb") do |db|
  db["alpha"] = "one"
  db.put("beta", "two")
  db.put("\x00raw\xff", "binary ok")     # arbitrary binary keys + values

  db["alpha"]            # => "one"
  db["missing"]          # => nil
  db.key?("beta")        # => true
  db.fetch("nope", "default")            # => "default" (or raises KeyError)
  db.delete("beta")      # => true (existed)

  # ordered iteration (ascending unsigned-byte key order)
  db.each      { |k, v| puts "#{k}=#{v}" }
  db.each_key  { |k|    puts k }

  # ordered range scans
  db.scan(gte: "a", lt: "m") { |k, v| ... }   # gt/lt exclusive, gte/lte inclusive
  db.range("a", "m")         { |k, v| ... }    # half-open [a, m)
  db.scan(gte: "a").to_a                       # Enumerator without a block

  db.size                # live key count
  db.flush               # seal the memtable into an SSTable
  db.compact             # merge SSTables, dropping shadowed keys + tombstones
end                      # closed automatically (flush + fsync)

Open without a block when you want to manage the lifetime yourself:

db = Lithos.open("data/mydb", sync: true)
db["k"] = "v"
db.close

Data persists across reopens, and survives a crash (process kill, power loss at the OS-flush boundary): reopening replays the WAL and discards any partial tail.

How it works

put/delete ──> WAL (append + CRC + fsync) ──> memtable (sorted std::map)
                                                  │  (size threshold)
                                                  ▼
                                          SSTable (immutable, mmap'd:
                                          sorted data + bloom + sparse index)
get ──> memtable ──> SSTables newest→oldest (bloom-filtered)
compact ──> k-way merge of all SSTables ──> one SSTable (drops tombstones)

The set of live SSTables + the active WAL is recorded in a MANIFEST that is rewritten atomically (temp file + FlushFileBuffers + atomic rename), so the on-disk catalog is never observed half-updated.

Notes

  • Durability vs speed: sync: true (default) fsyncs every write. For bulk loads, Lithos.open(path, sync: false) skips per-write fsync (call #flush or #close to make data durable) — faster, but a crash can lose recent writes.
  • Single writer: a store takes an exclusive directory lock on open; it is not safe to share one store across threads (give each thread its own, or use a Mutex).

License

MIT.