English | July 2015 | ISBN: 178328904X | 206 Pages | EPUB/MOBI/PDF (True) | 8.46 MB
If you are a Hadoop administrator and you want to get a good grounding in how to back up large amounts of data and manage Hadoop clusters, then this book is for you.
Learn the best strategies for data recovery from Hadoop backup clusters and troubleshoot problems
About This Book
Learn the fundamentals of Hadoop’s backup needs, recovery strategy, and troubleshooting
Determine common failure points, intimate HBase, and explore different backup techniques to resolve failures
Explore common issues and their solutions using in-depth knowledge of Hadoop
What You Will Learn
Familiarize yourself with HDFS and daemons
Determine backup areas, disaster recover principles, and backup needs
Understand the necessity for Hive metadata backup
Discover HBase to explore different backup styles, such as snapshot, replication, copy table, the HTable API, and manual backup
Learn the key considerations of a recovery strategy and restore data in the event of accidental deletion
Tune the performance of a Hadoop cluster and recover from scenarios such as failover, corruption, working drives, and NameNodes
Monitor node health, and explore various techniques for checks, including HDFS checks and MapReduce checks
Identify common hardware failure points and discover mitigation techniques
In Detail
Hadoop offers distributed processing of large datasets across clusters and is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. It enables computing solutions that are scalable, cost-effective, flexible, and fault tolerant to back up very large data sets from hardware failures.
Starting off with the basics of Hadoop administration, this book becomes increasingly exciting with the best strategies of backing up distributed storage databases.
You will gradually learn about the backup and recovery principles, discover the common failure points in Hadoop, and facts about backing up Hive metadata. A deep dive into the interesting world of Apache HBase will show you different ways of backing up data and will compare them. Going forward, you’ll learn the methods of defining recovery strategies for various causes of failures, failover recoveries, corruption, working drives, and metadata. Also covered are the concepts of Hadoop matrix and MapReduce. Finally, you’ll explore troubleshooting strategies and techniques to resolve failures.