philiprehberger-bloom_filter

Tests Gem Version Last updated

Space-efficient probabilistic set with configurable false positive rate

Requirements

  • Ruby >= 3.1

Installation

Add to your Gemfile:

gem "philiprehberger-bloom_filter"

Or install directly:

gem install philiprehberger-bloom_filter

Usage

require "philiprehberger/bloom_filter"

filter = Philiprehberger::BloomFilter.new(expected_items: 10_000, false_positive_rate: 0.01)
filter.add('hello')
filter.include?('hello')  # => true
filter.include?('world')  # => false

Merge Filters

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')
a.merge(b)
a.include?('beta')  # => true

Bulk Add

filter = Philiprehberger::BloomFilter.new(expected_items: 10_000)
filter.bulk_add(%w[alpha beta gamma delta])
filter.include?('beta')  # => true

Cardinality Estimation

filter.count_estimate  # => ~4.0 (estimated unique items)

Intersection

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.bulk_add(%w[shared only-a])
b.bulk_add(%w[shared only-b])

result = a.intersection(b)
result.include?('shared')  # => true
result.include?('only-a')  # => false

Fill Rate

filter.fill_rate  # => 0.023 (proportion of set bits)

Equality and Copying

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('hello')
b.add('hello')
a == b  # => true

clone = a.copy
clone.add('world')
a.include?('world')  # => false (independent copy)

False Positive Rate

filter = Philiprehberger::BloomFilter.new(expected_items: 1000, false_positive_rate: 0.01)
100.times { |i| filter.add("item-#{i}") }
filter.false_positive_rate  # => ~0.0001 (actual rate based on current fill)

Superset Check

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.bulk_add(%w[alpha beta gamma])
b.add('alpha')
a.superset?(b)  # => true

Empty Check

filter = Philiprehberger::BloomFilter.new(expected_items: 1000)
filter.empty?  # => true
filter.add('hello')
filter.empty?  # => false

Union (Non-Mutating)

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')

result = a.union(b)
result.include?('alpha')  # => true
result.include?('beta')   # => true
a.include?('beta')        # => false (a is unchanged)

Subset Check

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.bulk_add(%w[alpha beta gamma])
a.subset?(b)  # => true

Operator Aliases

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')

union = a | b              # same as a.union(b)
intersection = a & b       # same as a.intersection(b)

Compatibility Check

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.compatible?(b)  # => true, safe to merge / intersect / union

Saturation Check

filter = Philiprehberger::BloomFilter.new(expected_items: 100)
filter.saturated?  # => false
200.times { |i| filter.add("item-#{i}") }
filter.saturated?(threshold: 0.5)  # => true

Serialization

data = filter.serialize
restored = Philiprehberger::BloomFilter.deserialize(data)
restored.include?('hello')  # => true

JSON Serialization

json = filter.to_json
restored = Philiprehberger::BloomFilter.from_json(json)
restored.include?('hello')  # => true

API

Method Description
.new(expected_items:, false_positive_rate:) Create a new bloom filter
#add(item) Add an item to the filter
#include?(item) Check if an item might be in the filter
#merge(other) Merge another compatible filter into this one
#clear Reset the filter
#count Number of items added
#memory_usage Bit array size in bytes
#serialize Serialize to a hash
#bulk_add(items) Add all items from an enumerable
#count_estimate Estimate unique item count from fill rate
#intersection(other) Create filter matching items in both
#fill_rate Proportion of set bits (0.0 to 1.0)
#==(other) Structural equality comparison
#copy Create an independent deep clone
#false_positive_rate Actual FP rate based on current fill
#to_json Serialize to JSON string
#superset?(other) Check if self contains all bits of other
#empty? Check if no items have been added
#union(other) Non-mutating OR returning a new filter
#subset?(other) Check if every set bit in self is also set in other
#compatible?(other) Check structural compatibility with another filter
#saturated?(threshold:) True if fill rate is at/above threshold
`#\ (other)`
#&(other) Operator alias for #intersection
#hash / #eql? Hash key support consistent with #==
#inspect Human-readable representation
.deserialize(data) Restore a filter from serialized data
.from_json(str) Restore a filter from a JSON string

Development

bundle install
bundle exec rspec
bundle exec rubocop

Support

If you find this project useful:

Star the repo

🐛 Report issues

💡 Suggest features

❤️ Sponsor development

🌐 All Open Source Projects

💻 GitHub Profile

🔗 LinkedIn Profile

License

MIT