Module: Philiprehberger::BloomFilter

Defined in:
lib/philiprehberger/bloom_filter.rb,
lib/philiprehberger/bloom_filter/filter.rb,
lib/philiprehberger/bloom_filter/version.rb

Defined Under Namespace

Classes: Error, Filter

Constant Summary collapse

VERSION =
'0.6.0'

Class Method Summary collapse

Class Method Details

.deserialize(data) ⇒ Filter

Deserialize a bloom filter from a hash.

Parameters:

  • data (Hash)

    serialized bloom filter data

Returns:

  • (Filter)

    a restored bloom filter instance



23
24
25
# File 'lib/philiprehberger/bloom_filter.rb', line 23

def self.deserialize(data)
  Filter.deserialize(data)
end

.from_json(str) ⇒ Filter

Deserialize a bloom filter from a JSON string.

Parameters:

  • str (String)

    JSON string

Returns:

  • (Filter)

    a restored bloom filter instance



31
32
33
# File 'lib/philiprehberger/bloom_filter.rb', line 31

def self.from_json(str)
  Filter.from_json(str)
end

.new(expected_items:, false_positive_rate: 0.01) ⇒ Filter

Create a new Bloom filter.

Parameters:

  • expected_items (Integer)

    expected number of items

  • false_positive_rate (Float) (defaults to: 0.01)

    desired false positive rate (0.0 to 1.0)

Returns:

  • (Filter)

    a new bloom filter instance



15
16
17
# File 'lib/philiprehberger/bloom_filter.rb', line 15

def self.new(expected_items:, false_positive_rate: 0.01)
  Filter.new(expected_items: expected_items, false_positive_rate: false_positive_rate)
end

.optimal_hash_count(size:, expected_items:) ⇒ Integer

Compute the optimal number of hash functions for a given bit-array size.

Uses the formula: k = (m / n) * ln(2) where m = size and n = expected_items. Returns at least 1.

Parameters:

  • size (Integer)

    size of the bit array in bits (must be positive)

  • expected_items (Integer)

    expected number of items (must be positive)

Returns:

  • (Integer)

    optimal hash function count (minimum 1)

Raises:

  • (ArgumentError)

    if size is not a positive Integer

  • (ArgumentError)

    if expected_items is not a positive Integer



66
67
68
69
70
71
72
73
# File 'lib/philiprehberger/bloom_filter.rb', line 66

def self.optimal_hash_count(size:, expected_items:)
  raise ArgumentError, 'size must be a positive Integer' unless size.is_a?(Integer) && size.positive?
  unless expected_items.is_a?(Integer) && expected_items.positive?
    raise ArgumentError, 'expected_items must be a positive Integer'
  end

  [(size.to_f / expected_items * Math.log(2)).ceil, 1].max
end

.optimal_size(expected_items:, false_positive_rate: 0.01) ⇒ Integer

Compute the optimal bit-array size for a target false positive rate.

Uses the formula: m = -(n * ln(p)) / (ln(2)^2) where n = expected_items and p = false_positive_rate.

Parameters:

  • expected_items (Integer)

    expected number of items (must be positive)

  • false_positive_rate (Float) (defaults to: 0.01)

    desired false positive rate in (0.0, 1.0) exclusive

Returns:

  • (Integer)

    optimal number of bits

Raises:

  • (ArgumentError)

    if expected_items is not a positive Integer

  • (ArgumentError)

    if false_positive_rate is not within (0.0, 1.0) exclusive



45
46
47
48
49
50
51
52
53
54
# File 'lib/philiprehberger/bloom_filter.rb', line 45

def self.optimal_size(expected_items:, false_positive_rate: 0.01)
  unless expected_items.is_a?(Integer) && expected_items.positive?
    raise ArgumentError, 'expected_items must be a positive Integer'
  end
  unless false_positive_rate.is_a?(Numeric) && false_positive_rate > 0.0 && false_positive_rate < 1.0
    raise ArgumentError, 'false_positive_rate must be between 0 and 1 (exclusive)'
  end

  (-(expected_items * Math.log(false_positive_rate)) / (Math.log(2)**2)).ceil
end