| |
Bare Metal Recovery
for Disaster Recovery
Uptime is money, and when mission-critical systems are
offline, every second counts. Despite this obvious fact,
many companies fail to execute server restores quickly
because they do not appreciate the complexity of the
task until the time comes. Too often companies spin
their wheels simply trying to get systems to the point
where restores can begin. Precious time is wasted installing
operating systems and configuring hardware, or chasing
down the people skilled enough to perform these complicated
tasks, or both. Because server restores can be very
stressful to the individuals involved, inevitably mistakes
are made, jeopardizing the integrity of the restore
or even forcing the process to be restarted. As the
number of servers grows, the impacts of these challenges
increase exponentially.
Fully Automated System Recovery — With a single
command, server recovery steps are executed automatically
with almost no user intervention. There is no longer
a need to have multiple tools per operating system.
Bare Metal Restore works the same way regardless of
the platform, no longer are multiple tools per operating
system.
Dissimilar System Restore for Windows – Integrated
feature enables recovery to target Windows systems with
completely different hardware configurations, including
different network interface adapters, mass storage devices,
video adapters, motherboards, and CPU quantities and
types. Supports migration to systems from a different
hardware vendor. |
|
| |
Data
Replication for Disaster Recovery
IT organizations consider several criteria when evaluating
remote data replication architectures during disaster
recovery (DR) planning efforts. These include application
performance, usability, reliability, effectiveness with
respect to recovery time objective (RTO) and recovery
point objective (RPO) criteria, and cost. Such an analysis,
however, is complicated because DR objectives are constrained
by a variety of direct and indirect influences, such
as available network bandwidth, application write patterns,
network behavior (stability, protocol, reliability,
and latency), processing resources, and storage subsystem
performance.
When evaluating long-distance remote data replication
strategies, most IT organizations quickly discover that
the primary challenge is introducing the new remote
data replication capability unobtrusively—in a
way that is transparent to application users and ongoing
operations. Unfortunately, as IT organizations attempt
to extend their existing synchronous data replication
approaches over long distances, application performance
is adversely affected.
A long-standing, popular alternative technique for
long-distance remote replication is asynchronous data
replication. The advantage to this method is that it
unobtrusively provides long-distance remote data replication,
preserving application performance while providing DR
data protection.
One important asynchronous data replication factor
is the often-overlooked asynchronous replication buffer,
where data waits at a primary site pending replication
to a secondary remote location. While the concept of
a replication buffer is common to all asynchronous or
periodic replication technologies, the mechanism for
achieving it is not. How a solution handles the asynchronous
replication buffer has a significant affect on a number
of critical DR measures.
Synchronous replication
Writing data in synchronous mode assures applications
that data writes are completed before they receive a
write-completion indication. In RAID 1 synchronous mode
mirroring environments, this means that both primary
and secondary mirror writes are completed before such
an indication. This requirement allows two or more mirrored
data volumes to reflect a current data copy and a mutually
consistent data state.
Typically, synchronous mode data mirroring within
a data center occurs over local, high performance links
and involves a primary data copy and one or more secondary
recovery copies. With synchronous mode data mirroring,
a delay occurs, resulting from the requirement that
applications wait for the slowest mirror—primary
or secondary(s)—to complete its write before posting
the operation as completed to the application.
Asynchronous Replication
Asynchronous replication mode offers performance benefits
over synchronous by removing the replication latency
associated with increased distance and network hops.
This allows organizations to replicate data over virtually
any distance with little or no application performance
degradation. The catch is that when remote data replication
services finally commit the scheduled asynchronous writes
to disk with a media transfer operation, the writes
must occur in the precise order that applications issued
them. Otherwise, data corruption can render replicated
data sets useless for recovery purposes. Also note that
any asynchronous write operation harbors a persistent
risk that some unforeseen event, such as a sudden power
failure, may prevent actual data transfer. The benefits
of asynchronous replication solutions must be weighed
against the risks of delayed write commitment—primarily
potential data loss and corruption.
Site Failover Disaster Recovery
In the scenario that replicates the data and automatically
brings up the mission-critical applications at the disaster
recovery site, there is a different outcome. The data
is replicated as it was in the first scenario. The difference
is that the mission-critical applications are brought
up automatically in the correct sequence at the disaster
recovery site in addition to applying the DNS changes
that are necessary for users to access the applications
transparently (without having to make any changes to
how they access the application). The only real human
intervention in this scenario is the initial declaration
of the disaster. Once that has been done and the business
decision to move operations to the disaster recovery
site has been made, everything from this point forward
can be done automatically.
This is a very strong argument for automating the
disaster recovery of technology-based assets. In a time
of disaster, there is a tremendous amount of pressure
and stress to get everything back up and running and
available to users. In the manual process, mistakes
will be made for a variety of reasons. Maybe the documentation
for the procedures is out-of-date, poorly written, or
incomplete, or it cannot be found or is not available
because it was online at the primary data center. Maybe
there has been some configuration drift between the
primary and disaster recovery data centers. Having an
automated disaster recovery capability would eliminate
many of these risk factors. In addition, the same disaster
recovery infrastructure used in the event of a disaster
can (and should) be used on a regular, frequent basis
to "stress-test" recoverability of the replicated
data, the server environment, and the application environment.
The case for automating technology recovery has been
made in the previous examples concerning the use of
the backup/recovery environment, and the use of replication
only as the underlying disaster recovery technology.
Does this mean that the backup/recovery environment
is no longer needed? Absolutely not. It is still an
industry-wide best practice that all data be backed
up in a secure and reliable manner with the knowledge
that anything from the most trivial file to the most
complicated data warehouse can be restored at will.
It may impact whether or not duplicate tapes are made.
Alternatively, does this mean that replication is of
no value? Again, absolutely not. It is vital that mission-critical
data, application binaries, configuration files, and
user files be “copied” to an alternate site(s)
in a manner that is consistent with business requirements. |
|