Using Veeam for Exchange 2013 snapshots

Does the thought of VMWare snapshots for Exchange 2013 make you cringe? If so, we are much alike and your concern probably stems from unpleasant experiences you’ve had in the past. Exchange 2010 SP1 began the “healing” process between Exchange and VMWare however much of the stigma remains implanted in the heads of Exchange administrators.

Fear not! As the technologies continue to bridge together they grow more compatible and can thrive together with a small amount of work and monitoring. First, lets talk about the problems Veeam can cause; you will likely see these errors:

FailoverClustering – Event ID: 1135 – Cluster node ‘SERVER’ was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster.

1

MSExchangeRepl – Event ID: 4087 – Failed to move active database ‘NAME’ from server ‘SERVER’. Move comment: None specified.

Error: An error occurred while attempting a cluster operation. Error: Cluster API failed: “ClusterRegSetValue() failed with 0x6be. Error: The remote procedure call failed”

2

The problem is the very way that Veeam operates since it must necessarily “freeze” the guest node to complete the snapshot. This isn’t to say that Veeam isn’t following Microsoft Best-Practices. Veeam DOES in fact initialize VSS to take an Exchange-Aware snapshot so all is well with the backups and logs are being correctly truncated. However, during the snapshot period, the Cluster will detect a short outage and attempt to fail the databases which could set off some other failures as shown above.

What we have to do is change the Cluster settings to be more forgiving of these short “freezes”. The end result is an error-free backup and failover detection that is a little more forgiving of slight network outages or slower server responses. From any server in the DAG, open the Command-Prompt and enter the following text:

cluster /cluster:<DAGNAME> /prop SameSubnetDelay=2000:DWORD
cluster /cluster:<DAGNAME> /prop CrossSubnetDelay=4000:DWORD
cluster /cluster:<DAGNAME> /prop CrossSubnetThreshold=10:DWORD
cluster /cluster:<DAGNAME> /prop SameSubnetThreshold=10:DWORD

These settings are recommended by Veeam to force the cluster to allow for twice the amount of delay for the cluster’s heartbeat and network delay. The numbers look high but they really aren’t. 4,000 milliseconds is 4 seconds and for most companies a 4-second heartbeat tolerance will probably be just fine. I personally think the default second of 1,000 milliseconds is probably too low anyway. The other settings …SubnetThreshold is the failed heartbeat tolerance. By increasing this you also increase the failure tolerances before a fail-over automatically occurs. The default setting is 5 so by doubling that, we decrease the potential for an unplanned failover due to “glitches” with the network or short “freezes” like those instigated with products like Veeam.

These problems should instantly quieten down the Event Logs on your server if Veeam or anther product forces your Exchange Servers to “pause” momentarily for whatever reason. Moreover, if your Exchange environment is not closely being monitored then you may want to make these changes in order to make your DAG more stable during short unplanned problems with the network or perhaps host machines.

Advertisement

Alive and Kicking!

 

Microsoft Exchange Server is alive and well and Office 365 is here to stay.

Roughly 70% of my customers and partners are EXCELLENT candidates for Office 365 and I am very vocal about my support for the move. How many of us have witnessed a crashed database due to excessive logs? How many times have you fought with the load-balancer to figure out why connections are being dropped? How many days do you spend a year debating, fighting, monitoring or discussing Exchange backups and disaster recovery?  With thinning IT departments and greater messaging loads, it is far more difficult and costly to maintain a healthy Exchange environment than ever before. BUT, if BPOS was re-branded to “Office 365” in 2011 and it was the 3rd iteration of Microsoft’s Hosted messaging then why are 80% of the mailboxes still On-Premises in 2015? Why is there such a gap between Hosted and Local mail populations?

Mind-the-Gap_2.14.14-600x398

 

The Radicati Group (www.radicati.com)  has indicated that in 2014, Office 365 accounted for less than 20% of Exchange mailboxes worldwide.  (“Microsoft Office 365, Exchange Server and Outlook Market Analysis, 2014 – 2018”). This paper goes on to identify that Microsoft’s On-Premises market share will increase from 64% to 76% in by 2018 “as it continues to gain market share away from its competitors”.

 

Confused yet? The explanation is pretty easy when you remember that we are human and not machines.

We are in a transitional gap right now with Microsoft Exchange. The industry wants us to be in the cloud, Microsoft wants us in the cloud and most of us want to be there but we all move at different paces. It took me three years to ramp up on Office 365 due to my subdued interest and the complete lack of interest by most of my larger customers. I have since made the transition but very few of my larger customers have made the same mental shift. THIS is the reason for the gap and the reason Microsoft will continue offering the On-Premises version of Exchange until 2018 or even later.

We are creatures of habit and most resist change. Fears about security, privacy and resilience have slowed the adoption of Office 365 but eventually we will all be there. When the On-Premises mailbox population decreases to a number Microsoft is willing to sacrifice then the Exchange Server product will be forever retired.  Until then, I will continue to close the gap with bridges or catapults depending on the need.