Backup the Hadoop DFS
Block data files are backed up quickly.
Even if there is no visible external load on HBase, HBase internal processes such as region balancing, compaction goes on updating the HDFS blocks. So a raw copy may result in an inconsistence state.
Secondly, Hadoop, HBase as well as Hadoop HDFS keeps data in memory and flush at periodic intervals. So raw copy may result in an inconsistent state.
HBase Import and Export tool
The Map-Reduce Job downloads data to the given output path.
Providing a path like s3://backupbucket/ fails the program with exceptions like: Jets3tFileSystemStore failed with AWSCredentials.
HBase Table Copy tools
Another parallel replicated setup to switch.
Huge investment to keep running another parallel environment to replicate production data.
- Export complete data for the given set of tables to S3 bucket.
- Export incrementally data for the given set of tables to S3 bucket.
- List all complete as well as incremental backup repositories.
- Restore a table from backup based on the given backup repository.
- Runs in Map-Reduce
- In case of connection failure, retries with increasing delays
- Handles special characters like _ which creates the export and import activities.
- Enhancement of existing Export and Import tool with detail logging to report a failure than just exiting with a program status of 1.
- Works in human readable time format for taking, listing and restoring of backup than using system tick time or unix EPOCH time (Time represented as a Number than readabale format as YYYY.MM.DD 24HH:MINUTE:SECOND:MILLSECOND TIMEZONE
- All parameters are taken from command line which allows the cron job to run this at regular interval.
Some sample scripts to run the backup tool.
/usr/lib/jdk/bin/java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.history backup.folder=s3://mybucket