Class: Kotoshu::DataStructures::BloomFilter
- Inherits:
-
Object
- Object
- Kotoshu::DataStructures::BloomFilter
- Defined in:
- lib/kotoshu/data_structures/bloom_filter.rb
Overview
Bloom filter - probabilistic data structure for fast membership testing.
A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not.
Constant Summary collapse
- DEFAULT_FALSE_POSITIVE_RATE =
Default false positive rate (1%)
0.01- DEFAULT_EXPECTED_SIZE =
Default expected number of elements
10_000
Instance Attribute Summary collapse
-
#hash_count ⇒ Integer
readonly
Number of hash functions.
-
#item_count ⇒ Integer
readonly
Number of items added.
-
#size ⇒ Integer
readonly
Size of the bit array.
Instance Method Summary collapse
-
#add(item) ⇒ self
Add an element to the filter.
-
#clear ⇒ self
Clear all elements from the filter.
-
#initialize(expected_size: DEFAULT_EXPECTED_SIZE, false_positive_rate: DEFAULT_FALSE_POSITIVE_RATE, case_sensitive: false) ⇒ BloomFilter
constructor
Create a new Bloom filter.
-
#merge(other) ⇒ self
Merge another bloom filter into this one.
-
#stats ⇒ Hash
Get filter statistics.
Constructor Details
#initialize(expected_size: DEFAULT_EXPECTED_SIZE, false_positive_rate: DEFAULT_FALSE_POSITIVE_RATE, case_sensitive: false) ⇒ BloomFilter
Create a new Bloom filter.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/kotoshu/data_structures/bloom_filter.rb', line 39 def initialize(expected_size: DEFAULT_EXPECTED_SIZE, false_positive_rate: DEFAULT_FALSE_POSITIVE_RATE, case_sensitive: false) @case_sensitive = case_sensitive @item_count = 0 # Calculate optimal size and hash count # m = -n * ln(p) / (ln(2)^2) # k = (m/n) * ln(2) @size = calculate_size(expected_size, false_positive_rate) @hash_count = calculate_hash_count(@size, expected_size) # Initialize bit array @bits = Array.new(@size, false) end |
Instance Attribute Details
#hash_count ⇒ Integer (readonly)
Returns Number of hash functions.
29 30 31 |
# File 'lib/kotoshu/data_structures/bloom_filter.rb', line 29 def hash_count @hash_count end |
#item_count ⇒ Integer (readonly)
Returns Number of items added.
32 33 34 |
# File 'lib/kotoshu/data_structures/bloom_filter.rb', line 32 def item_count @item_count end |
#size ⇒ Integer (readonly)
Returns Size of the bit array.
26 27 28 |
# File 'lib/kotoshu/data_structures/bloom_filter.rb', line 26 def size @size end |
Instance Method Details
#add(item) ⇒ self
Add an element to the filter.
59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/kotoshu/data_structures/bloom_filter.rb', line 59 def add(item) normalized_item = normalize_item(item) @hash_count.times do |i| index = hash_index(normalized_item, i) @bits[index] = true end @item_count += 1 self end |
#clear ⇒ self
Clear all elements from the filter.
110 111 112 113 114 |
# File 'lib/kotoshu/data_structures/bloom_filter.rb', line 110 def clear @bits = Array.new(@size, false) @item_count = 0 self end |
#merge(other) ⇒ self
Merge another bloom filter into this one.
95 96 97 98 99 100 101 102 103 104 105 |
# File 'lib/kotoshu/data_structures/bloom_filter.rb', line 95 def merge(other) raise ArgumentError, "Cannot merge filters with different sizes" unless other.size == @size raise ArgumentError, "Cannot merge filters with different hash counts" unless other.hash_count == @hash_count @size.times do |i| @bits[i] = @bits[i] || other.instance_variable_get(:@bits)[i] end @item_count += other.item_count self end |
#stats ⇒ Hash
Get filter statistics.
119 120 121 122 123 124 125 |
# File 'lib/kotoshu/data_structures/bloom_filter.rb', line 119 def stats { size: @size, hash_count: @hash_count, item_count: @item_count } end |