You are browsing the archive for #msoms #sysctr.

OMS overview: chapter 2 – disaster recovery

7:23 am in #msoms, #sysctr by Jan Van Meirvenne

OMS Blog Series Index

Since there is much to say about each of the Operations Management Suite aka OMS services I will break up this post in a blog series:

Chapter 1: Introduction to OMS
Chapter 2: Disaster Recovery with OMS (this post)
Chapter 3: Backup with OMS
Chapter 4: Automation with OMS
Chapter 5: Monitoring and Analysis with OMS
Chapter 6: Conclusion and additional resources overview

This series is actually a recap of a live presentation I gave on the last SCUG event which was shared with another session on SCOM – SCCM better together scenario’s presented by SCUG colleagues Tim and Dieter. You can find my slides here, demo content that is shareable (the ones I don’t need to pay for Smile) will be made available in the applicable chapters.

Disaster Recovery is one of the bigger money sinks of IT. First you need yourself a big secondary datacenter able to maintain your business in case your primary one bites the dust. Not only you need to throw money at something you might never need in your entire career, but even have to invest in setting up DR plans for every service you are planning to protect. This usually requires both additional design and implementation work, and separate tooling that allows you to perform the DR scenario you envisioned. Especially in the world of hybrid cloud, protecting services across platform boundaries might seem like a complex thing to do: needing to integrate multiple platforms in preferably one single DR solution.

Meet Azure Site Recovery

Azure Site Recovery or ASR provides 2 types of DR capabilities:

  • Allowing the replication and failover orchestration of services between 2 physical sites of your own
  • Allowing the replication and failover orchestration of services between a main site that you own and the Azure IaaS platform 

Bear in mind that while the platform is advertised as a DR solution, it is also possible to use it as a migration tool to move workloads to Azure or other on-premise sites.

The big advantage here is that the solution is platform-agnostic, providing scenario’s to protect virtually any type of IT infrastructure platform you use. The DR site can be a secondary VMWare or Hyper-V (with SCVMM) Cloud, or Azure. Vendor-lock-in is becoming a non-issue this way!

Supported Scenario’s

Here is a full overview of the supported flows:

Infrastructure

To Azure

To an own DR-site

Application

  • SQL Always-On
  • Other application types must be orchestrated by using the recovery plan feature or by doing a side-by-side migration / failover

Architecture

Basically, there are 2 major ‘streams’ within ASR to facilitate DR operations, but in any case you’ll always need a Recovery Vault. The recovery vault is an encrypted container that sits on top on a (selectable) storage account. If the target DR site is Azure, the vault will store the replicated data and use it to deploy Azure IaaS VM’s in case of a failover. If the target DR site is another on-premise site, the vault will only store the metadata needed for ASR to protect the main site. The vault is accessed by the on-premise systems using downloadable vault encryption keys. These keys are used during the setup and are accompanied by a passphrase the user must enter and securely store on-premise. Without this passphrase the vault becomes inaccessible should systems be (re-)attached to the vault, so it is very important to double-, no triple-backup this key!

All communication between the different sites is also encrypted using SSL.

In the case that Azure is the target DR site, you must specify an Azure size for each VM you want to protect along with an Azure virtual network to connect it to. This allows you to control the cost impact.

The Microsoft Azure Recovery Services Agent (MARS) and the Microsoft Azure Site Recovery Provider (MASR)

This setup is applicable to any scenario where Hyper-V (with or without SCVMM) is the source site in the DR plan.

The MARS agent needs to be installed on every Hyper-V server which will take part in the DR-scenario (both source and target). This agent will facilitate the replication of the actual VM data from the source Hyper-V servers to the target site (Azure or other Hyper-V server).

The MASR agent needs to be installed on the SCVMM server(s) or in case of a Hyper-V site to Azure scenario it needs to be co-located on the source Hyper-V server together with the MARS agent. The MASR agent is responsible for orchestrating the replication and failover execution and primarily sync meta-data to ASR (actual replication data is done by the MARS agent).

image

image

The Process, Master Target and Configuration Server

This setup is used for any scenario with a VMWare (with some additional components described later on), Cloud (Azure or other) or Physical site as a source. These are the components which facilitate the Replication and DR Orchestration. Note: they are all IaaS level components.

Process Server

This component is placed in the source site and is responsible for pushing the mobility service to the protected servers, and to collect replication-data from the same servers. The process server will store, compress, encrypt and forward the data to the Master Target Server running in Azure.

Mobility Service

This is a helper-agent installed on all systems (Windows or Linux) to be protected. It leverages VSS (on Windows) to capture application-consistent snapshots and upload them to the process server. The initial sync is snapshot-based, but the subsequent replication is done by storing writes in-memory and mirroring them to the process server.

Master Target Server

The Master Target Server is an Azure-based system that receives replication-data from the source site’s process server and stores it in Azure Blob Storage. As a failover will incur heavy resource demands on this system (rollout of the replica’s into Azure IaaS VMs) it is important to choose the correct sizing in regards of storage (standard or premium) to ensure a service can failover within the established RTO.

Configuration Server

This is another Azure-based component that integrates with the other components (Master Target, Mobility Service, Process Server) to both setup and coordinate failover operations.

Failback to VMWare (or even failover to a DR VMWare site instead) is possible with this topology with some additional components. It is nice to see that Microsoft is really upping the ante in regards of providing a truly heterogeneous DR solution in the cloud!

image

Orchestrating workload failover/migration using the Recovery Plan feature

Of course, while you can protect you entire on-prem environment in one go, this is not an application-aware setup. If you want to make sure your services are failed over with respect of their topology (backend -> middleware -> application layer -> front-end) you need to use the recovery plan feature of ASR.

Recovery Plans allows you to define an ordered chain of VMs along with actions needing to be taken at source site shutdown (pre and post) and target site startup (pre and post). Such an action can be the execution of an automation runbook hosted by Azure Automation, or a manual action to be performed by an operator (the failover will actually halt until the action is marked as completed).


Source: https://azure.microsoft.com/en-us/documentation/articles/site-recovery-runbook-automation/

Failing Over

When in the end you want to perform an actual failover operation you can perform 3 types of actions:

– Test Failover: this keeps the source service/system online while booting the replica so you can validate it. Keep in mind that you should take possible resource conflicts (DNS, Network, connected systems) into account.

– Planned Failover: this makes sure that the replica is fully in-sync with the source service/system before shutting it down and then boots the replica. This ensures no data loss occurs. This action can be done when migrating workloads or protecting against a foreseen disaster (storm, flood,…) the protected service will be offline during the failover

– Unplanned Failover: this type only brings online the replica from the last sync. Data loss will be present as a gap between the failure moment and the last sync. This is only for instances where the disaster already occurred and you need to bring the service online at the DR ASAP.

A failover can be executed on the per-VM level or via a recovery plan.

Caveats and gotcha’s

Although the ASR service is production-ready and covers a lot of ground in terms of features, there are some limitations to take into account:

Here are some of the bigger limits:

When using Azure as a DR site

– Azure IaaS uses the VHD format as storage disk format, limiting the protectable size of the VHD or VHDX (conversion is done automatically) to 1024Gb. Larger sizes are not supported
– the amount of per-VM resources (CPU Cores, RAM, Disks) is limited by the supported resources provided by the largest Azure IaaS sizing (eg if you have 64 attached disks in your on-prem you might not be able to protect it if Azure’s maximum is 32)

Overall Restrictions

– Attached Storage setups like Fiber Channel, Pass-through Disks or iSCSI are not supported
– Gen2 Linux VMs are not yet supported

This looks nice! But how much does it cost?

The nice thing about using Azure as a DR-site is that you only pay a basic fee for the service, including storage and WAN traffic, but only pay the full price for IaaS compute resources when an actual failover occurs. This embodies the concept of ‘Pay what you use’ that is one of the big benefits of public cloud. Even better: you only start paying the basic fee after 31 days. So in case you would use ASR as a migration tool (moving workloads to the cloud or another site) you will have a pretty cost-effective solution! Bear in mind that used storage and WAN traffic is always billed.

I won’t bother to list the pricing here as it is as volatile in nature as the service itself. You can use the Azure pricing calculator to figure out the costs.

image

If you have one or more System Center licenses, check out the OMS suite pricing calculator instead to asses if you can benefit from the bundle-pricing.

Ok, I’ll bite, but how do I get started?

The service for now is only accessible from the ‘old’ Azure Portal on https://manage.windowsazure.com

Log in with an account that is associated with an Azure subscription, and click the ‘new’-button in the bottom-left corner.

image

Choose ‘Data Services’ -> ‘Recovery Services’ -> ‘Site Recovery Vault’

image

Click ‘Quick Create’ and then enter a unique name and choose the applicable region where you want to host the service. Then, click ‘Create Vault’.

image

This will create the vault from where you can start the DR setup

image

When the creation is done, go to ‘Recovery Services’ in the left-side Azure Service bar and then click on the vault you created.

image

The first thing you must do is to pick the appropriate scenario you want to execute

image

This will actually provide you with a tutorial to set up the chosen scenario!

image

To re-visit or change this tutorial during operational mode, just click the ‘cloud icon’ in the ASR interface

image

I won’t cover the further steps needed as the tutorials provided by Azure are exhaustive enough. I might add specific tutorials later on in a dedicated post in case that I encounter some advanced subjects.

 

Final Thoughts on ASR

While I am surely not a data protection guy, setting this puppy up was a breeze to me! This service, which is now part of OMS, embodies the core advantages of cloud: immediate value, low complexity and cross-platform. I have already seen several implementations, confirming that this solution is here to stay and will likely be a go-to option for companies looking for a cost-effective DR platform.

Thanks for the long read! And see you next time when we will touch ASR’s sister service Azure Backup! Jan out.

OMS overview: chapter 1– introduction

10:49 pm in Uncategorized by Jan Van Meirvenne

Hi all!

Before I dabble you under in the realm of Microsoft cloud platform management, I first want to bother you with some personal announcements!

It has been a long time time since I posted consistently for a while now, but I have some perfect excuses for this:

First off, I got married all the way back in April to my now wonderful wife Julie!

image

And if that was not enough of a life achievement, meet my son Julian, born in August (yes, Julie is the mother Winking smile)! This is one of the rare pictures where he smiles because he likes to keep things very serious generally.

image

And to complete this combo-high score, we will soon be on the lookout for our own home where we can develop ourselves as a family and live happily ever after!

Despite all of this I am still dedicated to acquire, produce and share knowledge regarding the Microsoft cloud technologies and while new balances will of course need to be sought, I pledge to continue this passion both on- and offline, whether at customer sites or community events! So let’s kick the tires again and start off with an introduction to the newborn cloud management platform, Operations Management Suite!

OMS Blog Series Index

Since there is much to say about each of the Operations Management Suite aka OMS I will break up this post in a blog series:

Chapter 1: Introduction to OMS (this post)
Chapter 2: Disaster Recovery with OMS
Chapter 3: Backup with OMS
Chapter 4: Automation with OMS
Chapter 5: Monitoring and Analysis with OMS
Chapter 6: Conclusion and additional resources overview

This series is actually a recap of a live presentation I gave on the last SCUG event which was shared with another session on SCOM – SCCM better together scenario’s presented by SCUG colleagues Tim and Dieter. You can find my slides here, demo content that is shareable (the ones I don’t need to pay for Smile) will be made available in the applicable chapters later on.

Chapter 1: Introduction

So what is OMS? Well, it is the management-tool answer to the hybrid cloud scenario

image

The hybrid cloud scenario entails a synergy between both your on-premise platforms, Microsoft cloud technologies like Azure and O365 and any 3rd party cloud platforms you might consume like Amazon for example.

This kind of ‘cloud of clouds’ is emerging everywhere and there are many companies that are already using cloud-based services today. This scenario provides a highly elastic and flexible way of working: quickly spin up additional business app instances in the cloud on-demand or have a full DR site ready to go with the push of a button, all with just the swipe of a credit card, and this are just 2 examples! However, a nice quote I read somewhere comes into play: ‘As the thing become easier on the front-end, as hard do they become on the back-end’. Essentially, things are easy when they are just contained on one platform and in one place. The hybrid cloud smashes this ideal by dictating that services should be deliverable through any platform from anywhere. This raises the important question: how do I distribute my services across all these platforms running in various locations while still being able to have my single pane of glass?

If you answer ‘System Center’ you are right, but partially, due to the following reasons:

– The System Center tools were born for on-prem management and although they can interface with cloud and cross-platform technology, their core platform is and will for now remain Windows Server, an on-prem platform.

– One of the goals of the hybrid cloud is to achieve hyper-scale: being able to spin up service instances in a matter of seconds. The kind of data that needs to be managed might be overwhelming for the current System Center tools. Have you checked your SCOM DW size and performance recently? Or did performance tests on your Orchestrator runbooks on high-volume demands? Or tried managing both Azure and Hyper-V VMs as a single unit? Don’t get me wrong, they can cope, but as these platforms were designed for on-prem scenario’s they can not always provide the hyper-elasticity and agility that their targets impose nowadays.

‘So what are you trying to say? That System Center is becoming an obsolete relic?’ Hell no! But just as the managed platforms evolve, so must the management ones do! This is why OMS has been developed, to close the gap, and not replace but extend the System Center story into the cloud.

Actually OMS is nothing new under the sun. It is just like its older brother EMS (cloud-based workplace management) a competitively priced bundle of Azure-based services which form a single management platform together. When you open the OMS site (www.microsoft.com/oms, one of the easiest URLs ever) for the first time you’ll see this:

image

Pretty abstract, I agree, but in fact the technology hiding behind these concepts are very simple and straightforward:

image

MAPPING

Concept Technology
Backup & Recovery Azure Backup / Azure Site Recovery
IT Automation Azure Automation
Log Analytics OMS (previously known as Microsoft Operational Insights)
Security & Compliance OMS (previously known as Microsoft Operational Insights)

 

As you see, both well and lesser known Azure services which are already in existence for quite some time power the platform. What I do like about bundling this in one suite is that the services are placed in a broader, general concept of management and are priced and presented in a much more coherent way! Just like System Center is a suite of separate software platforms, so are their Azure counterparts now!

In the next posts to come I will attempt to provide a thorough overview on each of these services and provide some scenario’s where they might be positioned perfectly in your environment.

Thanks for giving this post a read and I hope to catch you later! And should you have questions or remarks, don’t hesitate to provide me feedback!

Jan out.