Lately we experienced so called bloom filter bloat.
The heap memory was filled, and we regulary needed to give more and more memory to JVM.
Last setting for JVM heap we used was 14 GB.
We researched for 2 days different ways to optimize JVM garbage collector, but it turns out that those difficult optimizations have little result on cassandra. Soon we realized that our main problem is that cassandra running out of memory... again.
We discuss the problem on the IRC and someone suggested that problem is because of Cassandra bloom filters. Cassandra use these in order to avoid disk access when possible.
First and very important thing about them is that the bloom filter stores its data in JVM heap. This will be changed in Cassandra 1.2. It will use mmap() for it.
Second most important thing is - the more rows you have in a column family, the bigger bloom filter "space" is.
The bloom filter problem can be fixed with adding new nodes, because then data is "dispersed" over more machines.
Second way is to tune bloom_filter_fp_chance parameter.
It is per column family and after changing it, you need to do "scrub". Instead of "scrub" you may do "cleanup", "compact" or "repair".
We tried first with "cleanup", because we thought it will be faster, but we were wrong. "scrub" is fastest method.
The way it works is as following:
use my_keyspace; update column family users with bloom_filter_fp_chance = 0.1;
Question is now how much you need to set the bloom_filter_fp_chance parameter?
We did a small research on a 1.4 GB column family with ~ 8 M keys
We changed the bloom_filter_fp_chance param, then we checked the size of "*-Filter.db" files.
|bloom_filter_fp_chance||bloom filter size|
We decided to go for bloom_filter_fp_chance = 0.1 as default, and change it to something else if we see performance degradation.
The scrub process finished sometime in 25th.11.2012, e.g. in about 3 days.
During the scrub Cassandra did not slowdown at all.
Once scrub finished, the memory consumption dropped and now it uses about 8 GB RAM per node.
The disk files with bloom filters (*-Filter.db) dropped from ~4 GB to ~1 GB per node.