bloom filter

  • a space efficient, probabilistic data structure that is used to test whether an element is a member of a set in O(1) time
    • space efficient meaning fixed O(1) memory regardless of set size
    • probabilistic meaning there may be some false positive results, but never false negative
  • bloom filters are used everywhere. databases, web crawlers, cdn-caching, packet routing, searching through chemical structures
  • bloom filters are more space efficient than dictionaries. instead of storing actual keys, which is what forces a hash table to to consume memory proportional to the number and size of it’s elements, bloom filters consume only a $n$-bit array and store $k$-bits per element
  • limitations of the bloom filter include
    • some false positives
    • cannot delete from bloom filter
    • cannot enumerate the number of elements in bloom filter
    • though there are many variants of the bloom filter that help address some of these