Friday, May 2, 2014

RocksDB CentOS - 6.1 Installation with JNI

RocksDB is a way to leverage SSD hardware optimally. It's a way to un congest the network. However, the single digit micro second performance  comes from C++ simple calls of GET , SET on KV structure. Any complexity of data operation, requires custom logic implementation.
This blog is all about connecting to RocksDB from Java application. It can also be done using Thrift API.

Download the installation packages

Set the repository location and enable C++ repo.
cd /etc/yum.repos.d
yum --enablerepo=testing-1.1-devtools-6 install devtoolset-1.1-gcc devtoolset-1.1-gcc-c++
export CC=/opt/centos/devtoolset-1.1/root/usr/bin/gcc 
export CPP=/opt/centos/devtoolset-1.1/root/usr/bin/cpp
export CXX=/opt/centos/devtoolset-1.1/root/usr/bin/c++

Set  the rocksdb home, download the rocksdb packae from github. Unzip the package.

export ROCKSDB_HOME=/media/ephemeral0/rocksdb
export JAVA_HOME=/usr/java/jdk1.7.0_51
ls $JAVA_HOME/lib/tools.jar

cd /tmp
unzip master
mv rocksdb-master $ROCKSDB_HOME
cd $ROCKSDB_HOME  ; pwd ; ls

Enable C++ 2.0

tar -xzvf gflags-2.0-no-svn-files.tar.gz
cd gflags-2.0
./configure && make && sudo make install

Setup SNALLY compression

cd ..
tar -xzvf snappy-1.1.1.tar.gz
cd snappy-1.1.1
./configure && make && sudo make install
cd ..

Install Other compression libraries ZLIB and BZIP
sudo yum install zlib
sudo yum install zlib-devel

sudo yum install bzip2
sudo yum install bzip2-devel


export LD_LIBRARY_PATH=/usr/local/lib/
make clean; make
make check
make librocksdb.a

Build RocksDB JNI

wget -O
gzip -d apache-maven-3.1.1-bin.tar.gz
tar -xf apache-maven-3.1.1-bin.tar

export SNAPPY_HOME=$ROCKSDB_HOME/snappy-1.1.1; ls -alt $SNAPPY_HOME

make librocksdb.a

mkdir ${ROCKSDB_HOME}/dist/
cp ${ROCKSDB_HOME}/dist/

$ROCKSDB_HOME/apache-maven-3.1.1/bin/mvn clean install
cd rocksdbjni-linux64
$ROCKSDB_HOME/apache-maven-3.1.1/bin/mvn install

Distribute the JAR File.

cp rocksdbjni/target/*.jar ${ROCKSDB_HOME}/dist/


Next Generation Cache will be in SSD+DRAM from DRAM+NETWORK

Caching in Big Data Processing : Last Decade

There are four fundamental components leveraged for processing bigdata
  1. CPU
  2. Memory
  3. Disk
  4. Network
Architects play with these ingredients to balance the operational requirements and the cost per byte parameter. The result - Numerous hybrid architectural models which are majorly determined based on the nature of application in context.
One such application pattern in bigdata world is to fetch large volume of data for analysis. To perform interactive business intelligence, the data is to be served from cache layer. An effective cache layer means more cache hit. And that is determined by analysts behavior. Is most recently analyzed dataset requested again? In most cases it happens, as during an analysis, analysts keep playing with a subset of data to discover patterns. In these scenarios, a cache layer serves wonder by serving hot data from memory, saving the overhead of going to disk and looking up the inode table as well as seeking on the cylinders.
Last decade memcache served this layer with wonders in a distributed system setup. However stretching this aspect started building stresses on the network layers. And the first attempt at Facebook is to short circuit the packet layer by running Memcache with UDP and the result - A phenomenal increase in throughput and performance. However, the fundamental problem of network saturation still remained un addressed with this solution. With more and more distributed components hammering the network layer, it continues to reach the saturation point.

Future Bigdata Cache Layer

As these distributed systems were being evolving,  new memory hardware, SSD, was making it's way to enterprises as well as retail laptops with mass production. The maturity of SSD along with price drop created the component viable for architects to play with.

So the OLD model of DRAM + NETWORK has been challenged with new generation caches on DRAM + SSD.
To me RocksDB is the first step on this direction. Going forward, more and more vendors and products will explore on this area.


Real-Time Big Data Architecture Patterns

Bigdata Architecture Pattern1 – Cache to Shelter, answer to high writes.

  1. App server writes to  Shared in-memory cluster with a TTL and to queue dispatcher.
  2. Queue Dispatcher asynchronously writes to persistent bigdata database (With out TTL)
  3. Reads are first look in memory cache database. If it is missed, then look in persistent bigdata database.
Cache to Shelter Architecture

Bigdata Architecture Pattern2 – Distributed Merge & Update

  1. Read-Modify-Write step in a single RPC
  2. Saves data flow over network.


Bigdata Architecture Pattern3 – Distributed Merge & Update

  1. Good for write once read many workload.
  2. Embedded database (with TTL) for local in process cache.
  3. Embedded database used as cache and designed to use Memory and SSD deficiently.
  4. Bigdata Database works for distributed writes, sharding and failover.


Thursday, May 1, 2014

World's 10 Big HBase Database Cluster Details

For the last couple of years there has been lots of conferences on big data database. HBase has emerged as a closely integrated hadoop database in the eco-system.

Specially at  Facebook, month by month hundred of terabytes of data is flowing to HBase clusters. I have compiled these sessions to analyze the similarities among various implementations, configurations and take the learning to apply it for productionizing HBase.


      Facebook Adobe Ebay Groupon OCLC Gap Inc Pinterest Magnify causata Experian
Use Case Social Media
Web Traffic
Search Index
Business use cases Facebook Message Infrastructure
(SMSes, Messages, Im/Chat, Email)
Web Traffic
Business Events
User Interactions
Infra Data
> Data storage for listed item drives eBay > Search Indexing
Data storage for ranking data in the future
Deal Personalization System
User clicks
Service Logs
Delivers single-search-box-access to 943 million itels.
It hosts all these contents
Serving Apparel Catalog from HBase for Live Website

Inventory Updates
String PINs Internet Memory Research Customer Experience Management (Real-time Offer)  
Data Volume Data in cluster (Compression as specified) Records as well as Disk size > 300TB/Month ( Compressed, Unreplicated)
> Search indexes (Extra)
1.9B Records
Billion Rows
1.8 Billion Ownerships
663Million Articles
      1 Billion rows
15 bye/each compressed Events (Type, timestamp, identifier, attributes)
500+ TB
Why HBase Motivation Architectural constraints High write throughput
Horizontal scalability
Automatic Failover
Atomic read-modify-write operations
Bulk importing
                Limitation of SQL
Schema Flexibility
Key lookup
Cluster Servers Components used in the solution User Directory Service
Application Server
    Email System
Online System
    Scrapers, Parsers, Cleansers, Validators, pyres, HBase, Python web servers      
Hardware Data Center               Amazon
h1.4xlarge + SSD Backed
  # Racks   5+ (Per Capsule) 1     3          
  Network           10GB Interconnect         10GigE 
  #Slaves   15 7 225   44 16 10 to 50     20
  Slaves CPU   16 Cores 16 Cores 24 cores (HT) 24 Virtual Core 8 CPU     Dual Core CPU 2*6 core
Intel X5650
24 core 
  Slaves Memory   48GB 32 GB 72GB RAM 96GB RAM 32 GB RAM 8-16GB RAM   8GB RAM 48GB RAM  
  HBasse Mameory   24GB   15GB 25GB            
  Slaves Disk   12 * 1 TB 12 2TB  12 * 2TB 8 * 2TB Disks 8TB Disk     15TB/Node 4 x 10K SAS Disk  
  # Masters (NN+HM+ZK)   5 ZK
2 (NN + Backup)
2 (HM + Backup)
1 JT
  5 ZK   6
3 Controls and 3 Edges
3 3 ZK      
  Masters CPU                      
  Masters Memory                      
  Masters Disk                      
Configuration OS OS Flavour             Ubuntu      
    File System             ext4/nodiratime/nodealloc/lazy_itable_init      
    OS Scheduling             noop      
  Master JVM                      
  Region Server JVM XX:MaxNewSize    
  XX:NewSize     100m              
  MSLAB           Enabled        
  CMSInitiatingOccupanyFactor           Yes        
  XX:MaxNewSize     512m              
  HBase Settings Pre Split Tables       Yes            
  hbase.regionserver.handler.count     50            300000 Increased            
  hbase.hregion.max.filesize     53687091200 10GB            
  hbase.hregion.majorcompaction     0              
  hfile.block.cache.size     0.65     0.6      0.1            0.09              
  hbase.client.scanner.caching     200              
  hbase.hregion.memstore.block.multiplier     4 4            
  hbase.rpc.timeout     600000 Increased   Less        
  hbase.client.pause     3000              
  hbase.hregion.memstore.flush.size       134217728            
  hbase.hstore.blockingStoreFiles       100            
  Zookeeper Settings     5000              
  zookeeper.session.timeout           Less        
  hbase.zookeeper.blockingStoreFiles       False ( 0.94.2)            
  HDFS  Setting dfs.block.size     134217728              
  dfs.heartbeat.recheck.interval           Modified        
  dfs.datanode.max.xcievers     131072              
  Jobtracker     8              
  mapred.tasktracker.reduce.tasks.maximum     6              
Administrative Backup   Uses Scribe
Double Logging
(Application Server and Region Server)
          Synced to S3 with s3-apt plugin
S3 + HBase snapshot + Export Snapshot
  Manage Splitting               w/presplit tables Custom    
  Compaction               Manual Rolling      
  Reverse DNS               Yes on EC2      

Monday, April 28, 2014

Data Fracking

In 2006, Clive Humby drew the analogy between crude oil and data in a blog piece titled "Data is the new Oil" which since then has captured imagination of several commentators on big data. No one doubts the value of the 'resources' that varies in the effort required to extract. During a discussion with a billion dollar company CIO, he indicated that there is a lot of data but can you make it "analyzeable."

Perhaps, he was inferring to the challenges of dealing with unstructured data in a company's communication and information systems, besides the structured data silos that are also teeming with data. In our work with a few analytics companies, we found validation of this premise. Data in log files, PDFs, Images, etc. is one part of it. There is also the deep web, that part of data not readily accessible by googling, etc. or as this Howstuffworks article refers to it as 'hidden from plain site.'

Bizosys's HSearch is a Hadoop based search and analytics engine that has been adapted to deal with this challenge faced by data analysts, referred to commonly as Data Preparation or Data Harvesting. If indeed finding value in data poses these challlenges, then Clive's analogy to crude oil is valid. Take a look at our take on this. If today, Shale gas extraction represents the next frontier in oil extraction employing a process known as Hydraulic Fracturing, or Fracking, then our take on that is 'data fracking' as a process of making data accessible.