In my Session on SQL Server Remote Availability (you can still signup and watch the video or presentation up to a few months after the presentation was given), I mentioned that I’d provide some additional links and resources in the form of a blog post.
Here those resources and links are.
Remote Availability Basics
In my session on Remote Availability (RA), I covered how it was similar to High Availability – but much more akin to Disaster Recovery in terms of scope or purpose (even though RA typically uses or leverages many of the same tools and techniques as HA solutions).
To that end, I wanted to link to two great articles on Remote Availability – as typically discussed or implied within IT in general. Both come from the same site, TechRepublic and are by the same Author (Mike Talon):
The Need for Remote Availability
In case anyone was wondering: Remote Availability IS something that every organization needs – to one degree or another. And that’s simply because it’s impossible to depend upon data in any regard and not have some sort of contingency for how to preserve or recover that data in the case of an outage – even in cases when the outage may be more than just the loss of a component or system – such as when when the outage is at the data-center level.
Michael Otey (blog) recently blogged about the Price of High Availability – and while he was specifically talking about the cost benefits of implementing expensive redundancy and failover processes, the same thing can be said of the cost required to establish some sort of remote availability. Meaning that the price of setting up Remote Availability Solutions may look expensive at first blush – but these costs are nothing in comparison with the COST you’d incur without some sort of viable contingency plan.
And, to that end, one thing that I mentioned in my presentation is that you can get a quick sense for how much it would cost you to lose your data center by talking with the Financial folks in your organization to get a sense for what monthly revenues are. Then just break those values down by day, hour, and minute as needed to get a VERY ROUGH sense of how much an outage would cost you per hour and so on. And with that overly-simplified approach to estimating the cost of potential down-time, you can then get a feel for what kinds of budgets you may need to start looking at in order to defray the costs of actually LOSING operational capacity for any significant stretch of time.
And, once you have a sense for the potential business costs for outages and what kinds of potential budget you might have, then you’re ready to start dealing with RTOs and SLAs.
Recovery Time Objectives (RTOs) and Service Level Agreements (SLAs)
In my estimation tons of Small-to-Medium (SMBs) still run IT departments (or at least SQL Server deployments) without any form of SLAs in place. In my presentation on Remote Availability I covered how one of the BIG ways in which RA differs from HA is in the sense that RA must include much better interaction with management if RA solutions are going to be successful. That’s NOT to imply that HA or DR solutions don’t need communication with management. But, in my experience, even in organizations where there are no SLAs or RTOs, HA and DR are typically addressed (dutifully) by the IT team and ‘implicitly’ understood to be in place, to some degree, by management. Again, I don’t think that’s a recipe for success. I’m just saying that when it comes to address off-site contingencies for disaster recovery (i.e., Remote Availability solutions), you simply can NOT succeed in any way, shape, or form unless management is a willing partner right out of the gate.
Paul Randal (blog | twitter) has done a great job of highlighting the benefit of SLAs – and his expression of how the existence of SLAs show that IT and management are working in unison is a perfect example of what I’m trying to communicate in this regard – especially when it comes to RA solutions.
Likewise, when it comes to RTOs, the big thing to remember is that the key term there is ‘Objective’. Meaning that RTOs are based upon stated objectives – and therefore serve as the basis of helping define SLAs (by serving as core components in establishing MTTRs). Consequently, if you’d like a decent (high-level) overview of what RTOs mean (decoupled from all the business-speak), then I’d recommend the following article by Pierre Dorion as a decent place to start:
SQL Server and Remote Availability
I only spent about 10 minutes or so of my presentation specifically covering how to address RA solutions via SQL Server. (The first 10 minutes or so were on the ‘basics’ and importance of RA, the ‘middle’ 10 minutes were on options and techniques to use with SQL Server to achieve RA solutions, and the ‘last’ 10 minutes or so were on best practices, pitfalls, and things to consider (based on practical experience) when it comes to setting up RA solutions.)
To that end though there were a couple of things I wanted to provide links for.
For starters, there’s the licensing question. If there’s one thing that I typically do NOT touch during presentations it’s Licensing – because that can quickly get ugly in a hurry. However, given the special/awesome nature of how SQL Server supports warm failover servers via licensing, I wanted to make sure that was addressed. To that end I’ve also linked to a Microsoft FAQ that covers this in a bit more detail. And actually, since the FAQ doesn’t provide good anchors, I’m just copy/pasting the text from that FAQ in my blog. (So, if you find this anything after oh, say, 3 days after this post was published, you’ll want to verify that this policy hasn’t changed.)
At any rate, here’s the relevant snippet:
Q. How does licensing work for computers that run SQL Server 2005 in failover scenarios?
A.Failover support, where servers are clustered together and set to pick up processing duties if one computer should fail, is now available in Standard and Enterprise editions of SQL Server 2005. Under each of these editions, keeping a passive server for failover purposes does not require a license as long as the passive server has the same or fewer processors than the active server (under the per processor scenario). For details on which failover methods are available under each edition, visit the SQL Server 2005 Features Comparison page.
More info on licensing can be found here as well.
Otherwise, in my presentation on RA I recommended AGAINST using Replication and Mirroring as options for Remote Availability – given their difficulty and some of the overhead involved with managing these solutions. That said, there are ways in which you might want to address how Replication and Log shipping interact if you’re trying to achieve RA on a Replicated solution by adding Log Shipping into the mix. To that end:
Similarly, even though I don’t think that Mirroring is a good idea (given complexity), if there’s one thing that’s true about SQL Server, it’s that you can find a way to do just about anything. To that end, here’s an article on ‘distance’ mirroring:
And, with stretch clustering in mind, here are some additional links as well on that topic as well.
Otherwise, Log Shipping is actually pretty tame/easy to set up. (To the point where I really need to do a video series on it – to cover the ins/outs, pros/cons, and the step-by-step how-to details of how to set it up and configure/monitor/etc it.)
Confusing Increased Availability with Full Coverage from Disasters
Sadly, I just didn’t have enough time in my presentation to call out something that I like to address when ever talking about High Availability – which is that it’s essential that DBAs and IT Pros don’t confuse High Availability Solutions with Disaster Recovery. Because RA is really just a form of ‘stretch’ or ‘remote’ Disaster Recovery, this concept doesn’t need as much coverage as it normally would when discussing High Availability. None-the-less, if you’ve implemented a Remote Availability Solution, don’t make the mistake of thinking that it means you’re covered. Data Corruption and Loss can come from different sources. RA solutions are essentially geared towards dealing with SYSTEM failure or outages – especially at the site level. But even the best RA solution won’t do you any good if the corruption of your data was caused by a software glitch within one of your applications or if all of your Order Details records were deleted by an end-users ‘by accident’ (or even maliciously).
Consequently, defense in depth is the approach you need to take if your data is important. And I’ve outlined some additional thoughts on ways to protect your data in my article for SQL Server Magazine: Confusing High Availability with Disaster Preparedness.
UPDATE: Woops. I meant to include attribution for some of the photos used in my slides. Rather than cramming in attributions next to each photo in the slide (where no one could really read them), I wanted to list them in my follow-up post. Only, I totally forgot to do that yesterday when I pushed the post. So, without further ado (and to make sure I’m not a full-on cretin), here they are (and thanks to the great photogs who were willing to share these photos for use by hosers like me):
Slide / Photo Attributions
Slide 3 – Session Overview
Slide 6 – Do You Really Need RA?
Slide 6 – Do You Really Need RA?
Slide 6 – Do You Really Need RA?
Slide 6 – Do You Really Need RA?
Slide 8 – Primary RA Considerations (Juggling Swords and Chainsaws)
Slide 9 – Constraints: Infrastructure (Bing’s Data center of all places)
Slide 10 – Bandwidth Constraints
Slide 18 – Practical Advice (I’ve actually been in/around Aix-en-Provence)