Thursday, December 29, 2011

Starting the HBase Server from Eclipse

Why Start the HBASE server inside Eclipse!
  1. HBase custom filters are a powerful feature which helps to move processing near to data. However deployment of these custom filters require one to compile the dependent classes for the filter, package in a jar and make it available to the region server.  Any new changes to these custom filter code requires the complete cycle of stopping server, packaging new jar, copying to hbase lib folder and restarting it.
  2. How to debug the code writer inside these custom filters by putting a breakpoint.
  3. To see the code execution status, look to the Region Server Log file which is same as run Eclipse + CYGWIN + Notepad to view the log status
    Because of these above shortfalls, I decided to run HBase from Eclipse. Run as a Java Application or Debug as a Java application and setting breakpoints on my filter classes to see the execution path along with stacks.

    Steps to configure HBase for Eclipse
    What we need (All are included in the deployment package - Download
    1. chmod.exe program (32 bit is included)
    2. favicon.gif  
    3. HBaseLuncher.class
    4. Unzipping HBase release

    Setting things up
    1. Unzip the attached eclipse project folder.
    2. Add hbase/lib/ folder jars in the project build path libraies.
    3. Add hbase/conf folder to project build path libraies.
    4. Add hbase.jar to project build path libraies.
    5. Add your project to the Required projects in the Java build path.

    Starting the HBase in debug mode
    1. Run HBaseLuncher in debug/run mode.
    2. In the windows tray, you will see an HBase tray icon.
    3. Right click on the tray icon to start/stop the HBase server. 

    For any issues, please write to me:- abinash [at] bizosys [dot] com

    Tuesday, December 27, 2011

    Why Code

    On the Interaction Design Association's LinkedIn Group (IXDA), a member posted this query "Do Designers need to be able to code?". Some very good points have been raised and debated on how knowledge of HTML, CSS, even XAML, etc. can help designers understand how their designs are being translated into production, and how it helps improve designer-developer communication. The counter argument is that a designer brings in multiple perspectives (read generalist) and hence should not get bogged down in code level details, instead remain focused on overall design of user experience.

    Without taking sides, my view is that there is a larger question here - "Why code at all".

    Why code is not restricted to designers alone. In the world of desktop apps, for every formal, business application, there are a large number of informal Excel spreadsheet based applications out there. There is some level of code in there too, put in place by analysts and others. There was Coghead, A company with the grand vision to create drag-drop building of entire applications with databases for "tech savvy businesses" (mostly SMB). To quote from its Crunchbase profile "Coghead is a WYSIWYG database driven application service aimed at enabling non-developers to solve problems traditionally requiring programming knowledge"

    We are entering a paradigm where Code is likely to get pushed deeper below the hood simultaneously with the rise of a 'tech savvy' non-developer lot who use powerful browser based, drag drop convenience to cobble together apps - the UI, the server side backend using easy API calls. Why Code is a pertinent question for these consumers.

    At the same time, this is not a magic solution for everything, but may apply to a narrow class of applications that are mostly stand alone, with limited integration to multiple systems. Yet it could provide a replacement for many of those local spreadsheet based apps that are used to save and organize data locally. If you have an idea for such creative uses of browser based online app building, do share your comments.

    So today HTML, CSS, JavaScript, etc. may appeal for the ability to craft,  but tomorrow these may be replaced by need to know what APIs are available and how to use them. With a proliferation of great tools for non-developers, the argument Why code is not so much about replacing the need to master yet another skill or art, but to use the limited time available to focus on what needs building - the user experience, better visualization of all that data one is collecting, etc. With the advent of PaaS (platform as a service) players such as Microsoft Azure, Google App Engine, Salesforce's, etc. more tools that allow 'tech savvy', non-developers to play with APIs to make server side calls, etc. may become more convenient and easier. To create their own personal Apps. I welcome views and comments on what that future could be.

    Wednesday, December 7, 2011

    Three Reasons for another Prototyping Tool

    I share this view from Gartner analyst Mark McDonald and many other analysts that there is a business-IT gap in typical software development process, especially at the requirements capture stage, which "is the basis for business and IT conversation". The fact is often times as a project progresses from requirements to detailed design and coding, team members change, project locations shift. Not that there is a dearth of PMI best practices in place or CMM models to oversee governance. Projects slip!

    I presume there could be two reasons here. The obvious one is requirements rigor. Design Specs are outlined in lesser time than desirable. Secondly, there is a different granularity of information capture and sharing across team and over project life cycle using a variety of artifacts, with conversations loosely bound together by long chains of emails and  telecons. But still various, relevant perspectives of the analysts, the sponsor, developers, UI designers, don't come together into a single big picture that addresses original project charter. Requirements lie scattered across documents, Visio flows, PowerPoint slides, UI prototypes (requires coding), UI wireframes and sketches (differing fidelity), technical requirements documents, etc.and each follows different conventions, representation styles.

    10Screens was conceptualized to make it easy to create a consistent, high fidelity view within short span of time using drag drop interface. It allows teams to illustrate highly finished looking screen designs and process flow charts in a single space to bring all stakeholders on to the same page and comment in page. 10Screens is being entirely online facilitates sharing and collaboration.

    Another thing about, most prototypes is that they remain as specs and almost never make it to production code. The third reason for another prototyping tool is the opportunity to use the iterated prototype as final, finished in production UI! The impact of this is on overall effort, as it saves precious developer time who need not code the UI and instead focus only on business logic, putting together the server side - making calls to database, retrieving and serving client requests, etc. We have been trying this with simple apps and it seems to work. We are excited about the possibilities and promise to keep you posted out here. We plan to launch a 'Backend as a Service' into which the prototype can directly connect with. Of course, all this is Cloud based.

    10Screens - UPDATE

    We are happy to announce that 10Screens - Powerpoint for Prototyping, is now free!

    If you are one of our registered users, you must have noticed the 'regular changes' we keep making to the website. Please rest assured that the actual tool (launches in a new window) and your saved work is intact. We also thank many of our users worldwide for taking time out to share views and suggestions since we launched 10Screens in March 2011. We plan to take up your suggestions and improve the product soon. Please keep checking here for updates. Follow us on Twitter @bizosys where we also share announcements and updates.

    Meanwhile, happy prototyping with 10Screens!

    Tuesday, December 6, 2011

    HBase Backup to Amazon S3

    HSearch is our opensource, NoSQL, distributed, real-time search engine built on Hadoop and HBase. You can find more about it on

    We have evaluated various options to backup data inside HBase and built a solution. This post will explain the options and also provide the solution for anyone to download and implement it for their own HBase installations.

    Backup the Hadoop DFS
    Block data files are backed up quickly.
    Even if there is no visible external load on HBase, HBase internal processes such as region balancing, compaction goes on updating the HDFS blocks. So a raw copy may result in an inconsistence state.
    Secondly, Hadoop, HBase as well as Hadoop HDFS keeps data in memory and flush at periodic intervals. So raw copy may result in an inconsistent state.
    HBase Import and Export tool
    The Map-Reduce Job downloads data to the given output path.
    Providing a path like s3://backupbucket/ fails the program with exceptions like: Jets3tFileSystemStore failed with AWSCredentials.
    HBase Table Copy tools
    Another parallel replicated setup to switch.
    Huge investment to keep running another parallel environment to replicate production data.

    After considering these options we developed a simple tool, which backs up  data to Amazon S3 and restore when needed. Another requirement is to take a full backup over weekend and a daily incremental backup.

    In case of failures, it should first initiate a clean environment with all tables created and populated with latest full backup data and then apply all incremental backups sequentially. However, in this method deletes are not captured which may lead to some unnecessary data in tables. This is a known disadvantage of this method of backup and restore.
    This backup program internally used HBase Import and Export tools to execute the programs in a Map-Reduce method.

    Top 10 Features of the backup tool
    1. Export complete data for the given set of tables to S3 bucket.
    2. Export incrementally data for the given set of tables to S3 bucket.
    3. List all complete as well as incremental backup repositories.
    4. Restore a table from backup based on the given backup repository.
    5. Runs in Map-Reduce
    6. In case of connection failure, retries with increasing delays
    7. Handles special characters like _ which creates the export and import activities.
    8. Enhancement of existing Export and Import tool with detail logging to report a failure than just exiting with a program status of 1.
    9. Works in human readable time format for taking, listing and restoring of backup than using system tick time or unix EPOCH time (Time represented as a Number than readabale format as YYYY.MM.DD 24HH:MINUTE:SECOND:MILLSECOND TIMEZONE
    10. All parameters are taken from command line which allows the cron job to run this at regular interval.

    Setting up the tool

    Step # 1 : Download the package from
    This package includes the necessary jar files and the source code.

    Step # 2 : Setup a configuration file. Download the hbase-site.xml file.
    Add to this fs.s3.awsAccessKeyId, fs.s3.awsSecretAccessKey, fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey properties

    Step # 3 : Setup the class path with all jars existing inside the hbase/lib directory, hbase.jar file, java-xmlbuilder-0.4.jar, jets3t-0.8.1a.jar and hbackup-1.0-core.jar file bundled inside the downloaded hbackup.install.tar. Make sure hbackup-1.0-core.jar at the beginning of the classpath. In addition to this add the configuration directory to CLASSPATH which has kept hbase-site.xml file.

    Running the tool

    Usage: It runs in 4 modes as [backup.full], [backup.incremental], [backup.history] and [restore]
    mode=backup.full tables="comma separated tables" backup.folder=S3-Path  date="YYYY.MM.DD 24HH:MINUTE:SECOND:MILLSECOND TIMEZONE"

    Ex. mode=backup.full tables=tab1,tab2,tab3 backup.folder=s3://S3BucketABC/ date="2011.12.01 17:03:38:546 IST"
    Ex. Default time is now
    mode=backup.full tables=tab1,tab2,tab3 backup.folder=s3://S3BucketABC/


    mode=backup.incremental tables="comma separated tables" backup.folder=S3-Path duration.mins=In Minutes
                Ex. mode=backup.incremental backup.folder=s3://S3BucketABC/ duration.mins=30 tables=tab1,tab2,tab3

    This will backup changes happend in last 30 mins


    mode=backup.history backup.folder=S3-Path

    Ex. mode=backup.history backup.folder=s3://S3BucketABC/
    This will list all past archives. Incremental one ends with .incr


    mode=restore  backup.folder=S3-Path/ArchieveDate tables="comma separated tables"

    Ex. mode=backup.history backup.folder=s3://S3-Path/DAY_MON_HH_MI_SS_SSS_ZZZ_YYYY tables=tab1,tab2,tab3
    This will add the rows arcieved during that date. First apply a full backup and then apply incremental backups.


    Some sample scripts to run the backup tool.

    $ cat
    for file in `ls /mnt/hbase/lib`
    export CLASSPATH=$CLASSPATH:/mnt/hbase/lib/$file;

    export CLASSPATH=/mnt/hbase/hbase-0.90.4.jar:$CLASSPATH

    export CLASSPATH=/mnt/hbackup/hbackup-1.0-core.jar:/mnt/hbackup/java-xmlbuilder-0.4.jar:/mnt/hbackup/jets3t-0.8.1a.jar:/mnt/hbackup/conf:$CLASSPATH

    $ cat
    . /mnt/hbackup/bin/

    dd=`date "+%Y.%m.%d %H:%M:%S:000 %Z"`
    echo Backing up for date $dd
    for table in `echo table1 table2 table3`
    /usr/lib/jdk/bin/java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.full backup.folder=s3://mybucket/ tables=$table "date=$dd"
    sleep 10

    $ cat
    . /mnt/hbackup/bin/
    /usr/lib/jdk/bin/java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.history backup.folder=s3://mybucket