source: http://llimllib.github.io/bloomfilter-tutorial/
Operations
The basic bloom filter supports two operations: test and add.Test is used to check whether a given element is in the set or not. If it returns:
- false then the element is definitely not in the set.
- true then the element is probably in the set. The false positive rate is a function of the bloom filter's size and the number and independence of the hash functions used.
Applications
The classic example is using bloom filters to reduce expensive disk (or network) lookups for non-existent keys. If the element is not in the bloom filter, then we know for sure we don't need to perform the expensive lookup. On the other hand, if it is in the bloom filter, we perform the lookup, and we can expect it to fail some proportion of the time (the false positive rate).source: https://www.jasondavies.com/bloomfilter/ including a bloom filter implementation in JavaScript
Wikipedia
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter); the more elements that are added to the set, the larger the probability of false positives.The following topics are also covered:
- Algorithm description
- Space and time advantages
- Probability of false positives
- Optimal number of hash functions
- Approximating the number of items in a Bloom filter
- The union and intersection of sets
- Interesting properties
Extensions and applications
- Cache filtering
- Counting filters
- Decentralized aggregation
- Data synchronization
- Bloomier filters
- Compact approximators
- Stable Bloom filters
- Scalable Bloom filters
- Layered Bloom filters
- Attenuated Bloom filters
- Chemical structure searching
Examples
- Akamai's web servers use Bloom filters to prevent "one-hit-wonders" from being stored in its disk caches. One-hit-wonders are web objects requested by users just once, something that Akamai found applied to nearly three-quarters of their caching infrastructure. Using a Bloom filter to detect the second request for a web object and caching that object only on its second request prevents one-hit wonders from entering the disk cache.
- Google Bigtable, Apache HBase and Apache Cassandra, and Postgresql use Bloom filters to reduce the disk lookups for non-existent rows or columns. Avoiding costly disk lookups considerably increases the performance of a database query operation.
- The Google Chrome web browser used to use a Bloom filter to identify malicious URLs.
- Medium uses Bloom filters to avoid recommending articles a user has previously read.
No comments:
Post a Comment