Backup and disaster recovery (DR) are two very important aspects when companies implement hyper-convergent IT infrastructures (HCI). In this blog series, I would like to share some of my experiences during my work as an architect and consultant. The series mainly focuses on hyperconvergent environments based on NUTANIX together with the corresponding hypervisors VMware, Hyper-V and AHV. As well as concepts to consider when planning backups and disaster recovery for these infrastructures.

The world of “next-gen backup”

Over the last five years, the technologies in the IT industry have changed more rapidly than ever before. This also applies to data-center infrastructures. As virtualization continues to grow in importance and business requirements demand greater agility and flexibility, many of the classic multi-tier architectures are reaching their limits. Convergent, hyper-convergent and cloud architectures are becoming increasingly common.

In this ever-changing virtualized IT world, different approaches to backup and disaster recovery are required. But still, many companies define their backup and DR processes for the “old world”.

These processes and concepts often cannot keep up with a highly virtualized and flexible IT infrastructure based on hyperclient or cloud technologies. Furthermore, it is often necessary to ensure very low RPO/RTO times (see definition of RPO/RTO).

Modern DR and backup tools already focus on these new requirements and the features set to offer the best possible integration into hyperconvergent and cloud infrastructures.

General considerations for backup and DR in a hyper-convergent IT infrastructure

A backup and/or disaster recovery concept for a hyper-convergent infrastructure differs in many aspects and requirements from classic 3-tier environments.

Some of these aspects and requirements are:

 “All software-defined” (compute, storage, network)

  • Very high consolidation ratio of applications.

  • Many different workloads

  • Different hypervisors

  • Container Technologies

  • Specific API and programming interfaces

  • Highly segmented networks with high complexity

  • Very large scalability

Why is this important?

First of all, it is very important to always keep “the big picture” in mind. Backup and DR processes must be aligned with the business requirements and the resulting technical requirements.
Furthermore, it is important to understand how hyper-convergent infrastructures change or influence these processes. This can be a complex task in the first phase. However, the better and more detailed the business requirements are pointed out and the more the concept is aligned with the technical requirements, the better the implemented tools and processes will fit.

Below are a few questions about business and technical requirements which need to be addressed and discussed:

  • Which users or user groups will use the new HCI environment? (All or as many as possible should be identified.)

  • Do the “stakeholders” know about HCI systems?

  • Can all application administrators define their backup and DR requirements? (This is a very important point. After all, the application is often the focus of a company’s departments.)

  • Who defines backup and DR requirements within the company?

  • Responsibilities, people, processes

  • Are RPO/RTO already defined?

  • Are existing backup and DR documents available?

  • Are SLA’s already defined?

  • Who is responsible for the documentation?

  • Is there access to the documents?

  • Strategic business drivers for the Hyperconvergent Infrastructure

  • Are there any limitations or assumptions that need to be considered in the concept?

Definition of RPO/RTO and SLA

RPO/RTO and SLA definitions are very important aspects to consider when planning backup and DR processes. I would like to explain the meaning of these three terms in more detail below:

RTO: The Recovery Time Objective (RTO) is the time that elapses after a computer or application crash or network interruption before the system can resume normal operation and allow access to the data. The RTO value is expressed in a unit of time – in seconds, minutes, hours, or days.

RPO: The Recovery Point Objective (RPO) is the point in time at which an IT system or IT infrastructure can be restored after a failure. It remains directly dependent on the Recovery Time Object (RTO).

SLA: A Service Level Agreement (SLA) is a contract between a service provider and a customer. The contract usually specifies the scope of the service and the quality of service (service level). Many Internet Service Providers (ISP) offer their customers an SLA. IT departments in large companies often also set up service level agreements to make the services more transparent to customers (the users in the company’s departments). These services can be justified, measured and they can be compared with external offers.

SLAs can include the following definitions:

  • The percentage availability of services
  • The number of users that can be served simultaneously
  • Performance benchmarks for regular comparison with current values
  • The timeframe for early notification of any service restriction
  • The response time of the helpdesk for various classified problems
  • The availability of access
  • The provision of usage statistics

A backup and DR concept should primarily be aligned with these specifications to support business processes in the best possible way.

Conclusion

A well-developed backup and DR concept is essential to support the availability of core applications in an enterprise. Even in the new “cloud” based world.

The 2nd part of the article series deals with the technical functions and features of the NUTANIX platform and how they can be used within the backup and DR concept.

  • For information about Disaster Recovery, visit our IDR solution which provides a stand-alone application that makes it easy to be Disaster Ready. (link)
  • For information about ensuring the data quality of your tape library, visit our Tape Audit Tool solution which provides automated auditing on the quality of your backup data and ensure the data can be read when needed. (link)

The original version of this article by Patrick Huber (SVA GmbH) can be found under https://focus.sva.de/next-gen-backup/.

SVA Software, Inc.

provides solutions to secure, monitor, improve and troubleshoot the data and performance of your IT infrastructure, get in contact with us for more information.

  • General IT infrastructure automated monitoring: check out more about BVQ which provides transparency on the status and communication of your entire infrastructure from the compute to the Storage and SAN layers. (link)
  • Mainframe performance optimization: visit our Mainframe Service platform that provides solutions from reporting up to automated dynamic capping and maintenance. (link)
  • Ensuring data quality of your tape library: visit our Tape Audit Tool solution which provides automated auditing on the quality of your backup data and ensure the data can be read when needed. (link)
  • Disaster Recovery: visit our IDR solution which provides a stand-alone application that makes it easy to be Disaster Ready. (link)
  • VMware License Management: visit our GetVMware solution which helps you manage and decrease the licensing cost of your VMware infrastructure using different dashboards and tables in Splunk. (link)