Module: Philiprehberger::BloomFilter
- Defined in:
- lib/philiprehberger/bloom_filter.rb,
lib/philiprehberger/bloom_filter/filter.rb,
lib/philiprehberger/bloom_filter/version.rb
Defined Under Namespace
Constant Summary collapse
- VERSION =
'0.6.0'
Class Method Summary collapse
-
.deserialize(data) ⇒ Filter
Deserialize a bloom filter from a hash.
-
.from_json(str) ⇒ Filter
Deserialize a bloom filter from a JSON string.
-
.new(expected_items:, false_positive_rate: 0.01) ⇒ Filter
Create a new Bloom filter.
-
.optimal_hash_count(size:, expected_items:) ⇒ Integer
Compute the optimal number of hash functions for a given bit-array size.
-
.optimal_size(expected_items:, false_positive_rate: 0.01) ⇒ Integer
Compute the optimal bit-array size for a target false positive rate.
Class Method Details
.deserialize(data) ⇒ Filter
Deserialize a bloom filter from a hash.
23 24 25 |
# File 'lib/philiprehberger/bloom_filter.rb', line 23 def self.deserialize(data) Filter.deserialize(data) end |
.from_json(str) ⇒ Filter
Deserialize a bloom filter from a JSON string.
31 32 33 |
# File 'lib/philiprehberger/bloom_filter.rb', line 31 def self.from_json(str) Filter.from_json(str) end |
.new(expected_items:, false_positive_rate: 0.01) ⇒ Filter
Create a new Bloom filter.
15 16 17 |
# File 'lib/philiprehberger/bloom_filter.rb', line 15 def self.new(expected_items:, false_positive_rate: 0.01) Filter.new(expected_items: expected_items, false_positive_rate: false_positive_rate) end |
.optimal_hash_count(size:, expected_items:) ⇒ Integer
Compute the optimal number of hash functions for a given bit-array size.
Uses the formula: k = (m / n) * ln(2) where m = size and n = expected_items. Returns at least 1.
66 67 68 69 70 71 72 73 |
# File 'lib/philiprehberger/bloom_filter.rb', line 66 def self.optimal_hash_count(size:, expected_items:) raise ArgumentError, 'size must be a positive Integer' unless size.is_a?(Integer) && size.positive? unless expected_items.is_a?(Integer) && expected_items.positive? raise ArgumentError, 'expected_items must be a positive Integer' end [(size.to_f / expected_items * Math.log(2)).ceil, 1].max end |
.optimal_size(expected_items:, false_positive_rate: 0.01) ⇒ Integer
Compute the optimal bit-array size for a target false positive rate.
Uses the formula: m = -(n * ln(p)) / (ln(2)^2) where n = expected_items and p = false_positive_rate.
45 46 47 48 49 50 51 52 53 54 |
# File 'lib/philiprehberger/bloom_filter.rb', line 45 def self.optimal_size(expected_items:, false_positive_rate: 0.01) unless expected_items.is_a?(Integer) && expected_items.positive? raise ArgumentError, 'expected_items must be a positive Integer' end unless false_positive_rate.is_a?(Numeric) && false_positive_rate > 0.0 && false_positive_rate < 1.0 raise ArgumentError, 'false_positive_rate must be between 0 and 1 (exclusive)' end (-(expected_items * Math.log(false_positive_rate)) / (Math.log(2)**2)).ceil end |