SwissHash
Swiss Table hash map implementation as a Ruby C extension. Based on the design principles from Google's Abseil flat_hash_map, Rust's hashbrown, and Go 1.24 Swiss Tables, with architecture adapted for Ruby's object system.
Installation
gem install swiss_hash
Usage
require "swiss_hash"
h = SwissHash::Hash.new
h["key"] = "value"
h["key"] # => "value"
h.delete("key")
h.stats # => { capacity: 16, size: 0, ... }
Performance Results
Benchmarks on Ruby 3.1.7 / arm64-darwin24 (Apple Silicon, NEON SIMD).
Methodology: 21 iterations per test, 5 warmup runs, IQR-filtered mean, interleaved Ruby/SwissHash measurements per iteration with alternating start order to cancel out thermal drift and scheduling noise. Per-side coefficient of variation reported to distinguish real deltas from noise.
N = 100,000
| Operation | Ruby Hash | SwissHash | Delta |
|---|---|---|---|
| Insert (string keys) | 17.6 ms | 11.3 ms | −35.8% ⚡ |
| Delete + reinsert 25% | 9.8 ms | 8.9 ms | −9.0% |
| Insert (sequential int) | 7.0 ms | 6.4 ms | −8.7% |
| Mixed (70% read / 20% write / 10% delete) | 21.4 ms | 19.6 ms | −8.6% |
| Insert (random int) | 6.7 ms | 6.4 ms | −3.5% |
| Lookup (string keys) | 20.3 ms | 20.8 ms | +2.5% |
| Lookup (sequential int) | 11.7 ms | 12.2 ms | +4.4% |
N = 10,000
| Operation | Ruby Hash | SwissHash | Delta |
|---|---|---|---|
| Insert (string keys) | 1.63 ms | 1.10 ms | −32.8% ⚡ |
| Lookup (string keys) | 1.71 ms | 1.62 ms | −5.2% ⚡ |
| Mixed | 1.93 ms | 1.90 ms | −1.6% |
| Delete + reinsert | 0.91 ms | 0.89 ms | −2.0% |
| Insert (sequential int) | 0.66 ms | 0.67 ms | +2.5% |
| Lookup (sequential int) | 1.05 ms | 1.12 ms | +6.0% |
N = 1,000
Ruby Hash uses an AR-table (flat array, linear search) for small hashes — SwissHash doesn't have this small-map regime, so for very small integer-keyed workloads Ruby wins. String workloads still favour SwissHash due to wyhash and the lookup fast path.
| Operation | Ruby Hash | SwissHash | Delta |
|---|---|---|---|
| Insert (string keys) | 0.183 ms | 0.118 ms | −35.6% ⚡ |
| Lookup (string keys) | 0.184 ms | 0.155 ms | −15.5% ⚡ |
| Insert (sequential int) | 0.063 ms | 0.074 ms | +17.5% |
| Delete + reinsert | 0.094 ms | 0.102 ms | +8.2% |
Summary
- Faster on 5 of 7 operations at N=100k, some substantially (−36% string insert, −9% mixed workload, −9% delete+reinsert).
- Strictly faster for string keys at every size (25–35% faster inserts, break-even to 15% faster lookups).
- Near parity on lookups at N=100k (+2.5% on strings, +4.4% on ints) — remaining gap stems from Ruby VM's opcode specialization for
Hash#[], not the data structure.
Memory Usage
For 100,000 integer keys:
- SwissHash: 2,176 KB contiguous native memory, 4 GC slots
- Ruby Hash: managed via GC slots (not directly measurable)
- Load factor: 76.3% actual (max 87.5%)
- GC pressure: zero GC runs during insertion
Features
- SIMD-optimized probing: SSE2 (16-byte groups) on x86_64, NEON (8-byte groups) on ARM64, SWAR fallback elsewhere
- Memory efficient: Swiss Table layout with 87.5% max load factor
- Tombstone compaction: Automatic cleanup of deleted entries during resize
- Ruby compatibility: Supports frozen string keys, all Ruby object types
- Thread safety: Prevents reentrant modifications during callbacks
API
hash = SwissHash::Hash.new(capacity = 16)
# Basic operations
hash[key] = value
hash[key] # get, returns nil if absent
hash.delete(key) # returns old value or nil
# Enumeration
hash.each { |k, v| ... }
hash.keys
hash.values
# Size and status
hash.size # also: length
hash.empty?
hash.key?(key) # also: has_key?, include?
# Maintenance
hash.clear
hash.compact! # drop tombstones without reallocating
# Debugging
hash.stats # => { capacity:, size:, num_groups:, load_factor:,
# memory_bytes:, growth_left:, tombstones:, simd: }
Usage Recommendations
Use SwissHash when:
- Your hash keys are strings — inserts are 25–35% faster, lookups are on par or faster
- Your hash holds 10,000+ entries with any mix of reads, writes, and deletes
- You do heavy delete/reinsert churn — tombstone compaction handles it without pathological slowdown
- You need predictable native memory instead of scattered GC allocations
Stick with Ruby's built-in Hash when:
- Your hash is small (≤ a few hundred entries) and mostly lookup-heavy with integer keys — Ruby's AR-table wins for small integer-keyed workloads
- You depend on Hash-specific semantics: default blocks,
compare_by_identity, full insertion-order guarantees, or the completeHashAPI
Architecture
Swiss Table core
- Open addressing with 7-bit
H2metadata byte per slot; SIMD rejects non-matching slots in parallel - Group size 16 on SSE2 (full
_mm_movemask_epi8width, matching Abseil / hashbrown); group size 8 on NEON and portable SWAR fallback — matches hashbrown's deliberate ARM choice (NEON's multi-cycle movemask latency makes 16-wide groups lose to 8-wide SWAR) - Triangular probing —
i(i+1)/2— guarantees full coverage on power-of-2 capacities - Max load factor 87.5% (7/8)
Ruby-specific adaptations
- wyhash for string keys — faster than Ruby's SipHash on short strings, which dominate typical workloads
- Fibonacci multiplicative hash for Fixnum and Symbol keys — their low bits are already well-distributed, so avalanche mixers would be wasted work
- ASCII-7bit fast-path in key equality: frozen string keys have their coderange pre-computed on insert, so subsequent lookup comparisons skip
rb_enc_compatibleentirely and go straight tomemcmp - Encoding-index equality check as the first fast path in key comparison — avoids
rb_enc_compatibleon the common case of matching encodings - Inline
RTYPEDDATA_DATAon hot methods ([],[]=,delete,key?) — skips the type-check overhead ofTypedData_Get_Structon every operation - Prefetch
slots[off]right after the control-byte load so DRAM fetch overlaps with SIMD match extraction
Memory layout
- Separate control-byte array and slot array (hashbrown-style) — the tight control array scans well through L1/L2
- No zero-initialization of slot memory (
mallocinstead ofcalloc) — slots are only ever read after their control byte confirms they're live
Build
rake compile
Benchmarking
The included benchmark.rb produces statistically honest results:
bundle exec ruby benchmark.rb
Key features that make it trustworthy:
- Interleaved Ruby/SwissHash measurements per iteration with alternating start order — thermal throttling and fluctuating background load hit both sides equally
- 21 iterations with IQR-filtered mean (trims top and bottom 25%) — more robust than median on noisy laptop hardware
- 5 warmup runs to settle JIT, caches, and branch predictor
- Per-side coefficient of variation (
±X.X%) displayed so you can distinguish a real 5% delta from 5% noise - Correctness smoke test runs before measurement
Profiling
For profiling on macOS:
bundle exec ruby simp.rb # runs infinite lookup loop, prints PID
sample <PID> 60 -f /tmp/swiss.sample
filtercalltree /tmp/swiss.sample | head -100
Design References
- Matt Kulukundis, "Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step" — CppCon 2017
- Abseil: SwissTables design
- rust-lang/hashbrown — reference for SSE2/NEON/SWAR strategy choices
- Go 1.24 maps — probing and resize design trade-offs
- Aria Beingessner, "Swisstable, a Quick and Dirty Description" — implementer's notes
License
MIT