- 0 Comments
- June 13, 2014
- by David Lohnes
- Leave a comment
In this installment of our continuing series on SharePoint disaster recovery planning, we will consider Microsoft’s second DR option: Warm Standby.
Microsoft describes a Warm Standby DR procedure as follows: “Virtualization provides a workable and cost effective option for a warm standby recovery solution. . . . You can create virtual images of the production servers and ship these images to the standby data center.”
Server virtualization is widely recognized to be a powerful technology that enables server cloning and migration between distant locations. As hypervisor technology continues to increase in sophistication and feature set (such as the improvements seen in Hyper-V 2012 over previous versions), server snapshots are becoming an increasingly common component in DR strategies for a range of applications.
A Warm Standby SharePoint DR strategy utilizes server virtualization by taking VM snapshots of the SharePoint servers, shipping them to a secondary site, and then in the event of a disaster bringing them online there. In essence, rather than forcing the creation of a new farm, VM technology (in theory) allows the exact original farm servers to spring from the ashes in a new location.
VM snapshot technology is not without difficulty, however, when applied to SharePoint, and Microsoft itself is not always consistent about whether or not a VM-snapshot-based Warm Standby approach is actually recommended for SharePoint DR.
The difficulty with VM snapshots in relation to SharePoint is the same difficulty we discussed at the beginning of this series in regards to restoring the SharePoint Config DB from backup. Because of the dependency between the Config DB and the farm servers, just as restoring the Config DB from backup can break your farm, so too can restoring your farm servers from snapshots break your farm. VM snapshots of the SharePoint servers don’t capture the configuration database, and they don’t capture network traffic. Improperly restoring SharePoint servers to a previous point in time can have the same negative effects as improperly restoring the configuration database to a previous point in time. To repeat what was said in the previous post, “Things may initially look fine, but the lack of synchronization will cause unpredictable problems; in Microsoft’s words, ‘users may experience various random errors.'”
The solution to this difficulty is twofold. To make VM snapshots work as a part of a SharePoint DR strategy, you must:
- Shut down all farm servers together before you take your snapshots, and take the snapshots while they’re all offline. This step ensures that there is no farm configuration or process data hiding in network packets between the servers on the farm, that all active processes on the servers have stopped, and that all interactions with configuration database has stopped. The farm is in a completely static state.
- Take a snapshot of the SharePoint configuration database at the same time, while the farm servers are still down. This step in conjunction with the server snapshots in step 1 ensures that you have a complete picture of the farm at a single point in time.
Provided those necessary (and potentially time consuming and burdensome) prerequisite steps are taken, the warm standby restore process is fairly straightforward. To restore a SharePoint farm from VM snapshots, begin by restoring the backup of the configuration database that was taken with your VM snapshots. Then, spin up the VM snapshots and bring them online. When that is complete, you will essentially have your SharePoint farm operating again in the exact same state of configuration that it was in when you took the snapshots. After this step, you would have to restore any content or service application configuration changes that had happened since the time when the VM snapshots and configuration database backups were taken, much the same as you would have to after creating a new farm in a Cold Standby scenario. Also checks to make sure proper integration with the rest of the infrastructure would be important.
- Significantly cheaper than hot standby since no expensive replication tools or second set of licenses are needed
- Should be significantly faster than cold standby since no farm rebuild is needed
- Potentially offers virtually exact conformity between original farm and recovered farm
- Offers much better “fail back” potential than Cold Standby. Simply snapshot again, ship the VM images back to the original location, and repeat the process.
- Microsoft-approved DR strategy about which Microsoft is least enthusiastic
- Requires very careful management of snapshotting process to work properly
- Requires probably highest level of DR-readiness maintenance obligations (regularly—perhaps weekly—shutting down entire farm and taking snapshots)
- Probably has a higher possibility of failure compared to other Microsoft-approved DR options
- Recovery process may require more coordination between teams (VM management rights are required) than other options
 For a good explanation of the downsides of using VM snapshots as a part of SP DR, see this Microsoft PFE blog: http://blogs.msdn.com/b/mossbiz/archive/2013/01/14/sharepoint-vs-snapshots.aspx and the followup post. The writer does not pull any punches: “SharePoint HATES snapshots. They go against everything that SharePoint does to manage itself. They create inconsistency and conflict. They change the rules behind SharePoint’s back. They are, simply, evil.”
 As one Microsoft PFE put it in a series of e-mail exchanges: “VM Snapshots are not an option for backups/DR,” followed in the next e-mail by, “With conflicting information, Technet is the final reference. Since the Technet article ‘Plan for high availability and disaster recovery for SharePoint 2013’ defines the use of a ‘Virtual Warm Stand By Environment’ as an ‘acceptable’ option then it is a supported option.”