Friday, May 2, 2014

RocksDB CentOS - 6.1 Installation with JNI


RocksDB is a way to leverage SSD hardware optimally. It's a way to un congest the network. However, the single digit micro second performance  comes from C++ simple calls of GET , SET on KV structure. Any complexity of data operation, requires custom logic implementation.
 
This blog is all about connecting to RocksDB from Java application. It can also be done using Thrift API.

Download the installation packages

Set the repository location and enable C++ repo.
 
cd /etc/yum.repos.d
wget
http://people.centos.org/tru/devtools-1.1/devtools-1.1.repo
yum --enablerepo=testing-1.1-devtools-6 install devtoolset-1.1-gcc devtoolset-1.1-gcc-c++
export CC=/opt/centos/devtoolset-1.1/root/usr/bin/gcc 
export CPP=/opt/centos/devtoolset-1.1/root/usr/bin/cpp
export CXX=/opt/centos/devtoolset-1.1/root/usr/bin/c++

Set  the rocksdb home, download the rocksdb packae from github. Unzip the package.

export ROCKSDB_HOME=/media/ephemeral0/rocksdb
export JAVA_HOME=/usr/java/jdk1.7.0_51
ls $JAVA_HOME/lib/tools.jar


cd /tmp
wget
https://github.com/facebook/rocksdb/archive/master.zip
unzip master
mv rocksdb-master $ROCKSDB_HOME
cd $ROCKSDB_HOME  ; pwd ; ls


Enable C++ 2.0

wget https://gflags.googlecode.com/files/gflags-2.0-no-svn-files.tar.gz
tar -xzvf gflags-2.0-no-svn-files.tar.gz
cd gflags-2.0
./configure && make && sudo make install

Setup SNALLY compression

cd ..
wget https://snappy.googlecode.com/files/snappy-1.1.1.tar.gz
tar -xzvf snappy-1.1.1.tar.gz
cd snappy-1.1.1
./configure && make && sudo make install
cd ..


Install Other compression libraries ZLIB and BZIP
sudo yum install zlib
sudo yum install zlib-devel

sudo yum install bzip2
sudo yum install bzip2-devel


Build ROCKSDB

export LD_LIBRARY_PATH=/usr/local/lib/
make clean; make
make check
make librocksdb.so
make librocksdb.a


Build RocksDB JNI

wget -O rocksdbjni.zip https://github.com/fusesource/rocksdbjni/archive/master.zip
unzip rocksdbjni.zip
wget http://apache.tradebit.com/pub/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
gzip -d apache-maven-3.1.1-bin.tar.gz
tar -xf apache-maven-3.1.1-bin.tar

export SNAPPY_HOME=$ROCKSDB_HOME/snappy-1.1.1; ls -alt $SNAPPY_HOME
export ROCKSDBJNI_HOME=$ROCKSDB_HOME/rocksdbjni-master; ls $ROCKSDBJNI_HOME
export LIBRARY_PATH=${SNAPPY_HOME}
export C_INCLUDE_PATH=${LIBRARY_PATH}
export CPLUS_INCLUDE_PATH=${LIBRARY_PATH}


cd ${ROCKSDB_HOME}
make librocksdb.a

mkdir ${ROCKSDB_HOME}/dist/
cp librocksdb.so ${ROCKSDB_HOME}/dist/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${ROCKSDB_HOME}/dist/

cd ${ROCKSDBJNI_HOME}
$ROCKSDB_HOME/apache-maven-3.1.1/bin/mvn clean install
cd rocksdbjni-linux64
$ROCKSDB_HOME/apache-maven-3.1.1/bin/mvn install

cd ${ROCKSDBJNI_HOME}
Distribute the JAR File.

cp rocksdbjni/target/*.jar ${ROCKSDB_HOME}/dist/

 

Next Generation Cache will be in SSD+DRAM from DRAM+NETWORK

Caching in Big Data Processing : Last Decade

There are four fundamental components leveraged for processing bigdata
  1. CPU
  2. Memory
  3. Disk
  4. Network
Architects play with these ingredients to balance the operational requirements and the cost per byte parameter. The result - Numerous hybrid architectural models which are majorly determined based on the nature of application in context.
 
One such application pattern in bigdata world is to fetch large volume of data for analysis. To perform interactive business intelligence, the data is to be served from cache layer. An effective cache layer means more cache hit. And that is determined by analysts behavior. Is most recently analyzed dataset requested again? In most cases it happens, as during an analysis, analysts keep playing with a subset of data to discover patterns. In these scenarios, a cache layer serves wonder by serving hot data from memory, saving the overhead of going to disk and looking up the inode table as well as seeking on the cylinders.
 
Last decade memcache served this layer with wonders in a distributed system setup. However stretching this aspect started building stresses on the network layers. And the first attempt at Facebook is to short circuit the packet layer by running Memcache with UDP and the result - A phenomenal increase in throughput and performance. However, the fundamental problem of network saturation still remained un addressed with this solution. With more and more distributed components hammering the network layer, it continues to reach the saturation point.
 

Future Bigdata Cache Layer

As these distributed systems were being evolving,  new memory hardware, SSD, was making it's way to enterprises as well as retail laptops with mass production. The maturity of SSD along with price drop created the component viable for architects to play with.

So the OLD model of DRAM + NETWORK has been challenged with new generation caches on DRAM + SSD.
To me RocksDB is the first step on this direction. Going forward, more and more vendors and products will explore on this area.

  

Real-Time Big Data Architecture Patterns


Bigdata Architecture Pattern1 – Cache to Shelter, answer to high writes.

  1. App server writes to  Shared in-memory cluster with a TTL and to queue dispatcher.
  2. Queue Dispatcher asynchronously writes to persistent bigdata database (With out TTL)
  3. Reads are first look in memory cache database. If it is missed, then look in persistent bigdata database.
Cache to Shelter Architecture


Bigdata Architecture Pattern2 – Distributed Merge & Update

  1. Read-Modify-Write step in a single RPC
  2. Saves data flow over network.

 

Bigdata Architecture Pattern3 – Distributed Merge & Update

  1. Good for write once read many workload.
  2. Embedded database (with TTL) for local in process cache.
  3. Embedded database used as cache and designed to use Memory and SSD deficiently.
  4. Bigdata Database works for distributed writes, sharding and failover.

References:

Thursday, May 1, 2014

World's 10 Big HBase Database Cluster Details


For the last couple of years there has been lots of conferences on big data database. HBase has emerged as a closely integrated hadoop database in the eco-system.

Specially at  Facebook, month by month hundred of terabytes of data is flowing to HBase clusters. I have compiled these sessions to analyze the similarities among various implementations, configurations and take the learning to apply it for productionizing HBase.

 

      Facebook Adobe Ebay Groupon OCLC Gap Inc Pinterest Magnify causata Experian
Use Case Social Media
Web Traffic
Search Index
Business use cases Facebook Message Infrastructure
(SMSes, Messages, Im/Chat, Email)
Web Traffic
Business Events
User Interactions
Infra Data
> Data storage for listed item drives eBay > Search Indexing
Data storage for ranking data in the future
Deal Personalization System
User clicks
Emails
Service Logs
Delivers single-search-box-access to 943 million itels.
It hosts all these contents
Serving Apparel Catalog from HBase for Live Website

Inventory Updates
String PINs Internet Memory Research Customer Experience Management (Real-time Offer)  
Data Volume Data in cluster (Compression as specified) Records as well as Disk size > 300TB/Month ( Compressed, Unreplicated)
> Search indexes (Extra)
80GB,
1.9B Records
  TBs
Billion Rows
20TB
1.8 Billion Ownerships
663Million Articles
++
      1 Billion rows
15 bye/each compressed Events (Type, timestamp, identifier, attributes)
500+ TB
Why HBase Motivation Architectural constraints High write throughput
Horizontal scalability
Automatic Failover
Atomic read-modify-write operations
Bulk importing
MapReduce
                Limitation of SQL
Schema Flexibility
Key lookup
Architecture
Components
Cluster Servers Components used in the solution User Directory Service
Application Server
HBase
HDFS
ZK
HayStack
    Email System
Online System
HBase
    Scrapers, Parsers, Cleansers, Validators, pyres, HBase, Python web servers      
Hardware Data Center               Amazon
h1.4xlarge + SSD Backed
     
  # Racks   5+ (Per Capsule) 1     3          
  Network           10GB Interconnect         10GigE 
  #Slaves   15 7 225   44 16 10 to 50     20
  Slaves CPU   16 Cores 16 Cores 24 cores (HT) 24 Virtual Core 8 CPU     Dual Core CPU 2*6 core
Intel X5650
24 core 
  Slaves Memory   48GB 32 GB 72GB RAM 96GB RAM 32 GB RAM 8-16GB RAM   8GB RAM 48GB RAM  
  HBasse Mameory   24GB   15GB 25GB            
  Slaves Disk   12 * 1 TB 12 2TB  12 * 2TB 8 * 2TB Disks 8TB Disk     15TB/Node 4 x 10K SAS Disk  
  # Masters (NN+HM+ZK)   5 ZK
2 (NN + Backup)
2 (HM + Backup)
1 JT
  5 ZK   6
3 Controls and 3 Edges
3 3 ZK      
  Masters CPU                      
  Masters Memory                      
  Masters Disk                      
Configuration OS OS Flavour             Ubuntu      
    File System             ext4/nodiratime/nodealloc/lazy_itable_init      
    OS Scheduling             noop      
  Master JVM                      
  Region Server JVM XX:MaxNewSize    
150m
             
  XX:NewSize     100m              
  MSLAB           Enabled        
  CMSInitiatingOccupanyFactor           Yes        
  XX:MaxNewSize     512m              
  HBase Settings Pre Split Tables       Yes            
  hbase.regionserver.handler.count     50              
  hbase.regionserver.lease.period     300000 Increased            
  hbase.hregion.max.filesize     53687091200 10GB            
  hbase.hregion.majorcompaction     0              
  hfile.block.cache.size     0.65     0.6        
  hbase.regionserver.global.memstore.upperLimit     0.1              
  hbase.regionserver.global.memstore.lowerLimit     0.09              
  hbase.client.scanner.caching     200              
  hbase.hregion.memstore.block.multiplier     4 4            
  hbase.rpc.timeout     600000 Increased   Less        
  hbase.client.pause     3000              
  hbase.hregion.memstore.flush.size       134217728            
  hbase.hstore.blockingStoreFiles       100            
  Zookeeper Settings hbase.zookeeper.property.maxClientCnxns     5000              
  zookeeper.session.timeout           Less        
  hbase.zookeeper.blockingStoreFiles       False ( 0.94.2)            
  HDFS  Setting dfs.block.size     134217728              
  dfs.heartbeat.recheck.interval           Modified        
  dfs.datanode.max.xcievers     131072              
  Jobtracker mapred.tasktracker.map.tasks.maximum     8              
  mapred.tasktracker.reduce.tasks.maximum     6              
Administrative Backup   Uses Scribe
Double Logging
(Application Server and Region Server)
          Synced to S3 with s3-apt plugin
S3 + HBase snapshot + Export Snapshot
     
  Manage Splitting               w/presplit tables Custom    
  Compaction               Manual Rolling      
  Reverse DNS               Yes on EC2      
Source     http://www.youtube.com/watch?v=UaGINWPK068

http://borthakur.com/ftp/SIGMODRealtimeHadoopPresentation.pdf
http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe http://files.meetup.com/1350427/EBAY-HBase-Ops.pptx http://www.slideshare.net/cloudera/case-studies-session-3b http://www.slideshare.net/cloudera/10-h-base-for-the-worlds-libraries-ron-buckley-ocld-final http://www.slideshare.net/cloudera/1-serving-apparel-catalog-from-h-base-suraj-varma-gap-inc-finalupdated-last-minute http://www.slideshare.net/cloudera/operations-session-1 http://www.slideshare.net/cloudera/4-mignify-stanislov-barton-internet-memory-research-final-updated-last-minute http://www.slideshare.net/cloudera/causata-h-basecon-2013-2 http://www.slideshare.net/cloudera/hbasecon-2013

Monday, April 28, 2014

Data Fracking

In 2006, Clive Humby drew the analogy between crude oil and data in a blog piece titled "Data is the new Oil" which since then has captured imagination of several commentators on big data. No one doubts the value of the 'resources' that varies in the effort required to extract. During a discussion with a billion dollar company CIO, he indicated that there is a lot of data but can you make it "analyzeable."

Perhaps, he was inferring to the challenges of dealing with unstructured data in a company's communication and information systems, besides the structured data silos that are also teeming with data. In our work with a few analytics companies, we found validation of this premise. Data in log files, PDFs, Images, etc. is one part of it. There is also the deep web, that part of data not readily accessible by googling, etc. or as this Howstuffworks article refers to it as 'hidden from plain site.'

Bizosys's HSearch is a Hadoop based search and analytics engine that has been adapted to deal with this challenge faced by data analysts, referred to commonly as Data Preparation or Data Harvesting. If indeed finding value in data poses these challlenges, then Clive's analogy to crude oil is valid. Take a look at our take on this. If today, Shale gas extraction represents the next frontier in oil extraction employing a process known as Hydraulic Fracturing, or Fracking, then our take on that is 'data fracking' as a process of making data accessible.


Tuesday, January 22, 2013

The Origins of Big Data

While sharing our thoughts on big data with our communications team, we were story tellers. The story around big data was impromptu! We realized the oft quoted Volume, Variety and Velocity actually can be mapped to Transactions, Interactions and Actions. I have represented it using a Visual.ly infographic background.


Here is a summary -

“The trend we observe is that the problems around big data are increasingly being spoken about more in business terms viz. Transactions, Interactions, Actions and less in technology terms viz. Volume, Variety, Velocity. These two represent complementary aspects and from a big data perspective promise better business-IT alignment in 2013, as business gets hungrier still for more actionable information.”

Volume - Transactions

More interestingly, as in a story, it flowed along time and we realize that big data appears on the scene as an IT challenge to grapple with when the Volume happens, which comes from either growing rate of transactions. Sometimes transactions occur in several hundreds per second, or as billions of database records required to process in a single batch were the volume is multiplied due to newer, more sophisticated models being applied as in the case of risk analysis. Big data appears on the scene as a serious IT challenge and project to deal with associated issues around scale and performance of large volumes of data. Typically, these are Operational in nature and internal facing.

These large volumes are often dealt with by relying on a public Cloud infrastructure such as Amazon, Rackspace, Azure, etc. or more sophisticated solutions involving 'big data appliances' that combine Terabyte scale RAM at hardware level with  in-memory processing software from large companies such as HP, Oracle, SAP, etc.

Variety - Interactions

The next level of big data problems surface when dealing with external facing information arising out of Interactions with customers, and other stakeholders. Here one is confronted with a huge variety of information, mostly textual, captured from customers interactions with call centers, emails, or meta data from these including videos, logs, etc. The challenge is in semantic analysis of huge volumes of text to determine either user intent or sentiment and project brand reputation, etc. However, despite ability to process this volume and variety, getting a reasonably accurate measurement that is 'good enough' still remains a daunting challenge.

Value - Transactions + Interactions

The third level of big data appears where some try to make sense of all the data that is available - structured and unstructured, transactions and interactions, current and historical, to enrich the information, pollinate the raw text by determining business entities extracted, linking them to richer structured data, linking to yet other sources of external information, to triangulate and derive a better view of the data for more Value.

Velocity - Actions

And finally, we deal with Velocity of the information as it happens. Could be for obvious aspects like Fraud detection, etc. but also to determine actionable insights before information goes stale. This requires addressing all aspects of big data to be addressed as it flows and within a highly crunched time frame. For example, an equity analyst or broker would like to be informed about trading anomalies or patterns detected as intraday trades happen.

Friday, December 21, 2012

The business impact of Bigdata

As a company engaged in Big data before the term became as common as it is today, we are constantly having conversations around solutions that have a big data problem. Naturally, a lot of talk ranges around Hadoop, NoSQL, and other such technologies.

But what we notice is a  pattern in how this is impacting business. There is a company that caters to researchers who till recently were dealing with petabytes of data. This is a client company and we helped implement our HSearch real time big data search engine for Hadoop. Before this intervention, the norm was to wait for upto 3 days at times to receive a report for a query spanning the petabytes of distributed information that was characterized by huge volumes and lot of variety. Today, the norm has changed with big data solution and it is about sub second response times.

Similarly, in a conversation with a Telecom industry veteran, we were told that the health of telecom has always been networks monitored across large volume of transmission towers and together generate over 1 Terabyte of data each day as machine logs, sensor data, etc. The norm here was to receive health reports compiled at a weekly frequency. Now, some players are not satisfied with that and want to receive these reports on a daily basis, and possibly hourly or even in real time.

Not stopping at reporting as it happens, or in near real-time, the next question business is asking, if you can tell so fast, can you predict it will happen, especially in  a world of monitoring IT systems and machine generated data. We will leave predicting around human generated data analytics (read - social feed analysis) out of the story for the moment. Predictive analysis could mean predicting that a certain purple shade large size polo neck is soon going to run out of stock for a certain market region given other events. Or it could mean, more feasible, that a machine serving rising number of visitors to a site is likely to go down soon since its current sensor data indicates a historical pattern, therefore, alert the adminstrator or better still bring up a new node on demand and keep it warm and ready.

So it seems the value of big data is in its degree of freshness and actionability, and at most basic level, simply get the analysis or insight out faster by a manifold factor!