ruby-fst

Ruby bindings for the fst crate by Andrew Gallant. Provides finite state transducer backed ordered maps and sets with fast lookup, range queries, and fuzzy search.

Requirements

  • Ruby >= 3.0
  • Rust toolchain (for compilation)

Installation

gem 'ruby-fst'

Usage

Map

Ordered map from byte string keys to unsigned 64-bit integer values. Keys must be inserted in lexicographic order.

require 'ruby_fst'

builder = RubyFst::MapBuilder.new
builder.insert('bar', 2)
builder.insert('baz', 3)
builder.insert('foo', 1)
map = RubyFst::Map.new(builder.finish)

map['foo']        # => 1
map.get('missing') # => nil
map.contains?('bar') # => true
map.length        # => 3

map.each { |key, value| puts "#{key}: #{value}" }

Set

Ordered set of byte string keys.

builder = RubyFst::SetBuilder.new
builder.insert('bar')
builder.insert('baz')
builder.insert('foo')
set = RubyFst::Set.new(builder.finish)

set.contains?('foo') # => true
set.length           # => 3

set.each { |key| puts key }

Floor and ceiling lookups

get_le returns the greatest key less than or equal to the query. get_ge returns the smallest key greater than or equal to the query. Both return [key, value] or nil.

builder = RubyFst::MapBuilder.new
builder.insert('bar', 1)
builder.insert('foo', 2)
builder.insert('qux', 3)
map = RubyFst::Map.new(builder.finish)

map.get_le('dog') # => ["bar", 1]
map.get_ge('dog') # => ["foo", 2]
map.get_le('aaa') # => nil

This is useful for IP range lookups. Encode range starts as 4-byte big-endian keys and use get_le to find which range an IP falls into:

builder = RubyFst::MapBuilder.new
builder.insert([167_772_160].pack('N'), 1)  # 10.0.0.0
builder.insert([3_232_235_520].pack('N'), 2) # 192.168.0.0
map = RubyFst::Map.new(builder.finish)

ip = IPAddr.new('10.0.0.100').to_i
key, label_id = map.get_le([ip].pack('N'))

Find all keys within a given edit distance. The search runs as an automaton intersection with the FST, visiting only reachable states.

builder = RubyFst::MapBuilder.new
%w(bar baz cat foo fun).each_with_index { |w, i| builder.insert(w, i) }
map = RubyFst::Map.new(builder.finish)

map.search_levenshtein('far', 1) { |key, value| puts key }
# => bar

Works on sets too:

set.search_levenshtein('university', 2) { |key| puts key }

Serialization

# To/from bytes
bytes = map.to_bytes
map = RubyFst::Map.new(bytes)

# To/from file
map.save('/path/to/file.fst')
map = RubyFst::Map.from_path('/path/to/file.fst')

Development

bundle install
bundle exec rake compile test

License

MIT. See LICENSE.txt for details.

This gem wraps the fst crate by Andrew Gallant, also MIT licensed.