Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

What is Disaster Recovery?

Disaster recovery (DR) involves a set of policies and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Disaster recovery focuses on the IT or technology systems supporting critical business functions.

A full description of Disaster Recovery is available on Wikipedia.

...

JARs and custom plug-ins can be found in <Datameer X Home>/etc. Backup all files and subfolders.

Datameer X DAP database

A full backup of the application database allows you to recover from unexpected software or hardware failures when there is high possibility to lose large amounts of Datameer X metadata. This is also a prerequisite for upgrades or moving the Datameer X installation. 

Default installations of Datameer X include a MySQL DAP database. The database contains all metadata information relating to Datameer X artifacts. Without this database, the product can't function.

Local execution

Datameer X artifacts are stored in the local filesystem when a job is run with the custom property: das.execution-framework=Local. This is appropriate for certain usage scenarios, but it becomes necessary to back these files up.

...

  • Backup <Datameer X Home>/das-data
  • Backup <Datameer X Home>/data

Datameer X 6.x:

  • Backup <Datameer X Home>/data

Datameer X 7, 7.x, 10.x, and 11.x:

  • Backup <Datameer X Home>/data

...

Apache documentation: DistCp Version 2

HDFS Snapshots

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.

Apache documentation: HDFS Snapshots

Cloudera

Cloudera Manager

Cloudera Manager provides an integrated, easy-to-use management solution for enabling data protection on the Hadoop platform. Cloudera Manager enables you to replicate data across datacenters for disaster recovery scenarios. Replications can include data stored in HDFS, data stored in Hive tables, Hive metastore data, and Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore.

Cloudera documentation: Backup and Disaster Recovery

Cloudera Manager Snapshots

Cloudera Manager enables the creation of snapshot policies that define the directories or tables to be snapshotted, the intervals at which snapshots should be taken, and the number of snapshots that should be kept for each snapshot interval. You can also create HBase and HDFS snapshots using Cloudera Manager or by using the command line.

Cloudera documentation: Cloudera Manager Snapshot Policies

Cloudera Enterprise BDR

Cloudera Enterprise Backup and Disaster Recovery (BDR) uses replication schedules to copy data from one cluster to another, enabling the second cluster to provide a backup for the first. In case of any data loss, the second cluster—the backup—can be used to restore data to production.

Cloudera tutorials: BDR Tutorials

HortonWorks

Mirroring Data with Falcon

You can mirror data between on-premise clusters or between an on-premises HDFS cluster and a cluster in the cloud using Microsoft Azure or Amazon S3. Mirroring data produces an exact copy of the data and keeps both copies synchronized. You can use Falcon to mirror HDFS directories, Hive tables, and snapshots.

HortonWorks documentation: Mirroring Data with Falcon

Replicating Data with Falcon

Falcon can replicate data across multiple clusters using DistCp A replication feed allows you to set a retention policy and do it according to the frequency you specify in the feed entity. Falcon uses a pull-based replication mechanism, meaning in every target cluster, for a given source cluster, a coordinator is scheduled that pulls the data using DistCp from the source cluster.

HortonWorks documentation: Replicating Data with Falcon

Incremental backup of data using Falcon for Disaster Recovery and Burst Capacity

Apache Falcon simplifies the configuration of data motion with: replication; lifecycle management; lineage and traceability. This provides data governance consistency across Hadoop components. This tutorial walks through a scenario where email data gets processed on multiple HDP clusters around the country then gets backed up hourly on a cloud hosted cluster.

HortonWorks tutorial: Incremental backup of data from HDP to Azure using Falcon for Disaster Recovery and Burst Capacity

Managing Hadoop DR with 'distcp' and 'snapshots'

Traditional 'distcp' from one directory to another or from cluster to cluster has limitations when it comes to doing updates. These limitations can lead to incorrect updates or incomplete updates. This document explores leveraging HDFS snapshots with distcp to eliminate this problem.

HortonWorks article: Managing Hadoop DR with 'distcp' and 'snapshots'

Disaster recovery and Backup best practices in a typical Hadoop Cluster

Disaster recovery plan or a business process contingency plan is a set of well-defined process or procedures that needs to be executed so that the effects of a disaster is minimized and the organization is able to either maintain or quickly resume mission-critical operations.

HortonWorks articles: Series 1Series 2

MapR

Disaster Recovery

The MapR Converged Data Platform includes backup and mirroring capabilities to protect against data loss after a site-wide disaster. MapR is the only big data platform that provides built-in, enterprise-grade DR for files, databases, and events. MapR was built to address real-world DR scenarios where lost data and downtime result in lost revenue, lost productivity, and/or failed opportunities.

MapR documentation: Disaster Recovery

MapR Snapshots

The ability to create and manage snapshots is an essential feature expected from enterprise-grade storage systems. This capability is increasingly seen as critical with big data systems as well. Snapshot means capturing the state of the storage system at an exact point in time and is used to provide full recovery of data in the event of data loss.

MapR documentation: MapR Snapshots

Scenario: Disaster Recovery

A severe natural disaster can cripple an entire datacenter, leading to permanent data loss unless a disaster plan is in place.
Solution: Mirroring to another cluster

...