Caching in Big Data Processing : Last Decade
There are four fundamental components leveraged for processing bigdata
Architects play with these ingredients to balance the operational requirements and the cost per byte parameter. The result - Numerous hybrid architectural models which are majorly determined based on the nature of application in context.
One such application pattern in bigdata world is to fetch large volume of data for analysis. To perform interactive business intelligence, the data is to be served from cache layer. An effective cache layer means more cache hit. And that is determined by analysts behavior. Is most recently analyzed dataset requested again? In most cases it happens, as during an analysis, analysts keep playing with a subset of data to discover patterns. In these scenarios, a cache layer serves wonder by serving hot data from memory, saving the overhead of going to disk and looking up the inode table as well as seeking on the cylinders.
Last decade memcache served this layer with wonders in a distributed system setup. However stretching this aspect started building stresses on the network layers. And the first attempt at Facebook is to short circuit the packet layer by running Memcache with UDP and the result - A phenomenal increase in throughput and performance. However, the fundamental problem of network saturation still remained un addressed with this solution. With more and more distributed components hammering the network layer, it continues to reach the saturation point.
Future Bigdata Cache Layer
As these distributed systems were being evolving, new memory hardware, SSD, was making it's way to enterprises as well as retail laptops with mass production. The maturity of SSD along with price drop created the component viable for architects to play with.
So the OLD model of DRAM + NETWORK has been challenged with new generation caches on DRAM + SSD.
To me RocksDB is the first step on this direction. Going forward, more and more vendors and products will explore on this area.