Sunday, November 27, 2011

J2EE application server deployment for HBase File Locality


The data node that shares the same physical host has a copy of all data the region server requires. If you are running a scan or get or any other use-case you can be sure to get the best performance “ (Reference : http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html )
Now let's consider an Mail Server application which stores all mails of users.
Usually to handle large concurrent user base, when a user logs on he is routed to one of the available server. This logic could be round robin or load base routing. Once the session gets created in the allocated machine, the user starts accessing his mails from HBase. But the data may not be served from local machine. This is because, user might have logged on in a different machine and a high chance that the record was created to the same region server node of that machine. Now as the requesting machine is not definite, the information will flow through wire and gets served.


Co-locating the client and the original region server, would minimize this network trip. This is possible by sticky routing the user to the same machine again and again across days as long as the node is available. This will ensure the local data access via same region server to same data node to local disk. But most of the load balancers are not designed like that. In reality they are designed to route based on number of active connections. This model works OK to balance out CPU and memory. A hybrid model will work best for balancing CPU, memory and network together.
This way of co-locating application server, Hbase region server, hdfs data node may impose a security risk for credit card transactional systems. Those kind of systems may like to have one more firewall between the database and application server. In high traffic that will primarily choke the network. In best interest of security and scalability, information architects need to divide their application’s sensitive data (ex. Credit card information) and the “low risk data” creating the threat model. Based on this, a dedicated remote HBase cluster backed by a firewall could be created for serving sensitive information.

1 comment:

  1. Fantastic information it is.Your information is so nice and interesting.This article has many valuable side.Thanks you very much for shearing this information. it outsourcing services

    ReplyDelete