Detecting Banned Users From Scratch Using A Bloom Filter
Illustrated guides to become better engineers
Let’s say we have a list of banned usernames in a game. Like
banned_usernames = [‘TheHxxx’, ‘DWiz’, ‘JDE0x‘]
We want to check if the username ‘Duk’ is found in the set to enable sign in for example. We want to derive the logic from scratch and not use any available language constructs. We can use a bloom filter to assert categorically that the username is not in the list of banned usernames.
A bloom filter is useful when we want to ensure something is not in a set. It is not useful to check if something is in a set.
A bloom filter
A bloom filter has an array of bits set to 0. We also have indices from 0 to 9.
Adding to the bloom filter
Let’s add our banned usernames to the bloom filter. Let’s assume we are using sha256 as the hash function to better follow along. Our aim is to have an integer which we then modulo with the array size. We set the resulting index bit to 1.
We do the same for the next username.
And again for the next username.
Now we’ve added the 3 names.
Testing for non-existence
Now, let’s test if Duk exists in the filter. It’s simple. If even one of the resulting indices bits is 0, it does not exist in the list.
Duk is not in the list of banned usernames as at least one of the bits is a zero.
Implementation notes
In real life, we don’t use cryptographic functions because of speed among others.
The size of the filter is referred to as m and the number of hashes is defined as k.
The number of elements (here usernames) we can fit in the filter is defined as n. It is often hard-coded or approximated as in the below formula (taken from Wikipedia)
where n* is an estimate of the number of items in the filter, m is the length (size) of the filter, k is the number of hash functions, and X is the number of bits set to one.
This blog has a nice bloom filter Python implementation using cryptographic functions.
Use in caching
Bloom filters are used in caching to cache elements from the 2nd hit onward.
Conclusion
Bloom filters have many variations such as counting filters to enable deletion of elements. They are an incredible way to know when something is not present.
Note: The illustrations don’t convey actual results, like they are for illustration purposes. Some long number 79546348576…4735378456%10 will be 6 even if you don’t know the number
Discussions
@thesecip asks:
Isn't a bloom filter probabilistic? So you would have a certain chance of getting a false positive and a false negative. Letting a banned user access the service? How is this better than a hash map with perfect hashing? P.S. just generating discussion. Nice post.