R is for Recovery, RPO and RTO Explained

melissa • January 05, 2017 • No Comments

You’ve probably heard the terms RPO and RTO thrown around a lot, but what do they really mean?  RPO stands for Recovery Point Objective, and RTO stands for Recovery Time Objective.  It can be easy to flip flop these two terms, especially if you’re just getting used to them or just learning about their concepts.  These two three letter terms are a hot topic if you’re looking to take the VMware Certified Advanced Professional – Design Exam (formerly known as the VCAP-DCD).

Breaking It Down

Your first step is to memorize what the terms mean.  The R is Recovery, and the O is Objective, that’s the easy part.  P is Point and T is Time.  Use these terms to help you remember which is which.

RPO is the POINT in which you need to recover to.  So, if I have a business continuity event (like zombies taking out my main datacenter), and what POINT am I restoring my data to?  Five minutes ago, an hour ago, a day ago?  Chances different parts of your environment will have different RPOs, but more on that later.

RTO is the TIME in which you will take to recover to that POINT.  I think the use of time is what may confuse people, since RTO and RPO are both generally measured in time as a metric.  How long will it take you to recover from an event?  How much TIME will it take?

If you find yourself sitting in front of the exam and can’t remember which is which, break it down.  Write out the terms and what they mean.  Focus on P, and what that stands for, Since T is the one which may confuse you.  Remember, P is the POINT you want to recover your data from.

As I mentioned, one of the most confusing aspects of recoverability may be both RPO and RTO being measured in time.  You may seem them measured in minutes, hours, days, or even weeks.  Remember, there is no one size fits all approach for RTO and RPO, they are highly dependent on the unique business requirements of a particular organization.  You may hear people talk about RPO 0, which is the paragon of business continuity.  This means you will recover from an event with 0 data loss.  It is often maintained for only the most critical applications, as it can be quite expensive.  Does it have its use cases?  Absolutely, but you may not see RPO 0 in every environment you walk into.

When you’re designing any component of an infrastructure, RPO and RTO are important aspects.  RPO 0 is going to be architected much differently than a 4 hour RPO, and the cost difference between the two could be immense as well.  The tricky part about RTO and RPO is they do vary within your infrastructure, so you will need to plan accordingly to satisfy them all.  Remember, you may have the “best” design in the world, but if it doesn’t suit your customer’s business requirements, it won’t be the best to your customer.

This is just the tip of the iceberg.  Once you’re comfortable with what RTO and RPO are, and how they can impact an environment, it is time to start thinking about the bigger picture.  After all, your RTO and RPO don’t really matter if you can’t meet your SLA with them, do they?  Your SLA will probably be represented by some series of nines such as 99% or 99.9% or even 99.999%!  Once again our friend time will come into play, since the nines translate to the amount of unplanned downtime allowed in the environment.  We’re not done with our buddy time yet, since you need to determine when these nines are applicable.

Now once you’re ready to start diving into these concepts, I highly recommend reading Rene Van Den Bedem’s post on VCDX – Recoverability impacting Availability Explained.  Rene spells everything out in a way which really makes sense, while using visual aids to illustrate this concept across an infrastructure environment.

Categories architecture