If you are new to the world of disaster recovery (and let’s face it, many people are), it can be a bit overwhelming. Let’s face it, disaster recovery isn’t the favorite of most people, mostly because it has tended to take a lot of time and effort for something you may not even use.
The rise of ransomware has changed this. Ransomware is a disaster, and the best thing you can have at your disposal is a fully tested and up to date disaster recovery plan.
Is there more to it than that? Of course there is, but a solid disaster recovery plan is the bare minimum when it comes to being able to recover.
Before you go down this path too far, there are a couple of things to know. If you are new to the world of DR, you’re going to see a bunch of acronyms floating around, that will probably confuse you.
Don’t worry, I’ve been doing disaster recovery for a very long time, and I have you covered. We are going to take a look at some very common disaster recovery terms, and what they mean in plain language.
Popular Disaster Recovery Terms
Here are some of the most popular terms used when talking about disaster recovery, and what they mean.
Disaster Recovery – DR
Disaster recovery is the process of recovering from a disaster.
This can include anything from restoring data to rebuilding infrastructure.
There are many different disaster recovery strategies, and the right strategy for your business will depend on your specific needs.
It is a term that is used very generally, but has very specific planning that must go into it.
Business Impact Analysis – BIA
The first step in any disaster recovery planning is to understand the business impact of not being able to operate. This is called a Business Impact Analysis, or BIA for short.
The BIA will help you understand how long your business can be down, and what the financial impact will be. It is important to remember that the goal of disaster recovery is to get the business up and running as quickly as possible, with as little impact to the bottom line as possible.
The information we get from the BIA will carry into our disaster recovery planning. It is very important that this information is accurate, and that it was gained by cross functional collaboration within an organization.
Let’s face it, every application owner may tell you their application is the most critical, but the BIA is about finding the data to back up those statements.
Recovery Time Objective – RTO
The Recovery Time Objective (RTO) is the amount of time that a business can be down before it starts to have a significant impact on the bottom line. I also like to think of it as the time it takes to recover.
For example, if your business can’t be down for more than 4 hours, then your RTO would be 4 hours.
The RTO is a very important number, because it drives a lot of the decisions that are made about disaster recovery. The RTO is a determining factor in how you choose to protect your data so you can recover it in the required time.
Recovery Point Objective – RPO
The Recovery Point Objective (RPO) is the amount of data that a business can lose before it starts to have a significant impact on the bottom line.
For example, if your business can’t afford to lose more than 1 hour of data, then your RPO would be 1 hour.
Like the RTO, the RPO is a very important number, because it drives a lot of the decisions that are made about disaster recovery.
The biggest decision it drives is how frequently are you protecting your data. There are many different ways to protect data or an application, and the RPO number will determine how you do it.
You may hear RPO and RTO referred to collectively as “Recovery Objectives” just to keep things a bit more simple and ditch some of the acronyms.
There are two main recovery objectives: Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
The RTO is the amount of time that a business can be down before it starts to have a significant impact on the bottom line.
For example, if your business can’t be down for more than
This is another way to refer to RTO or RPO, or recovery objectives. I see this acronym quite a bit, but it is not my favorite.
Disaster Recovery Plan – DRP
A Disaster Recovery Plan (DRP) is a document that outlines how a business will recover from a disaster.
The DRP should include everything from the BIA to the RTO to the RPO. It should also include a step-by-step plan for how recovery will happen.
The DRP should be tested on a regular basis, to make sure that it will work when it is needed.
Anyone with a minimum level of technical skill should be able to follow the DR plan to recover. It should be clear and easy to follow. In the best case scenarios, the creation, updates, testing, and execution of disaster recovery plans should be automated.
Business Continuity – BC
Business continuity is the ability of a business to continue to operate, even in the face of a disaster.
There are many different ways to achieve business continuity, and it is a very important part of disaster recovery. BC and DR do have some differences though. Disaster recovery plans are much more tactical, and often technical. Depending on how they are written, they will detail how an application or system is to be recovered.
At the BC plan level, the focus is on the business as a whole, not the specifics of each application. Sometimes these terms are used interchangeably which isn’t the most accurate.
Stay tuned for a deep dive on the two of these critical plans.
Maximum Tolerable Downtime – MTD
The Maximum Tolerable Downtime (MTD) is the amount of time that a business can be down before it starts to have a significant impact on the bottom line.
For example, if your business can’t afford to be down for more than 4 hours, then your MTD would be 4 hours.
Service Level Agreement – SLA
A Service Level Agreement (SLA) is a contract between a business and a service provider.
The SLA should outline the expectations of both parties, and what will happen if those expectations are not met.
For example, if you have an SLA with your disaster recovery provider, then you expect them to meet certain standards, and they expect you to pay them if they do not meet those standards.
You will see this number referred to in nines. For example, a SLA of 99.9% is “three nines” and translates to 8 hours 45 minutes and 46 seconds of unplanned downtime per year.
This is a huge one. You will hear people refer to RPO 0 or RPO of 0. What they mean is there is NO DATA LOSS in the event of a disaster.
This can be achieved, however it is usually very costly. Many times people will start with an RPO of 0 without business data to back it up, and be shocked when they see how much a solution costs.
Use caution when you hear this term, and listen carefully. This is usually a signal that the BIA was not done correctly, or not done at all.
However, there are absolutely times when an RPO of 0 is a requirement.
Disaster Recovery Basics
This is a lot if you are new to disaster recovery, but it covers the very basics. Disaster recovery is often seen as complex and time consuming, which of course, it can be.
It can also be much simpler than people think.
Lately, I have been talking about “ransomware and disaster recovery”, because quite frankly the two are starting to blend together.
If you have a good disaster recovery plan, you don’t need a net new ransomware recovery plan or cyber recovery plan.
The big difference with ransomware recovery is where you are recovering too. I see many people say they will just recover in place after a ransomware attack, but you simply do not know if that will be possible.
What if law enforcement quarantines your production environment? What if you are struck by VMware ransomware and can’t use your VMware environment?
The key to successful ransomware recovery is having multiple places to recover to, so you can pick the best one for how the attack unravelled.
Existing DR plans then need to be augmented to account for this, and tested.
Ransomware is a disaster, so stay tuned as we dive deeper into ransomware and disaster recovery.
Melissa is an Independent Technology Analyst & Content Creator, focused on IT infrastructure and information security. She is a VMware Certified Design Expert (VCDX-236) and has spent her career focused on the full IT infrastructure stack.