The Dread Pirate Re-Seed – Part 1
Among the most common questions I get from clients about the new data-protection features in Exchange 2007 (and the soon-to-be-released Exchange 2010) is, “What is a re-seed and why does it happen?” This mostly falls into the category of “fear of the unknown” since the technology is new, and documentation on how it works is somewhat scarce.
Re-seeds are a commonly confusing part of most protection methods, though they fall under different names and methodologies. In a solution like Double-Take (see disclaimer below), they’re called re-mirrors or re-synchronization operations – and are typically differences only. In a tape-backup solution it’s a restore operation, and might be everything, incremental pieces, or some combination thereof. In Exchange 2007 these operations are called re-seeds, basically the replay of data from a server that has a “correct” copy to one that does not. Today, we see these operations in Exchange 2007 Local Continuous Replication (LCR), Cluster Continuous Replication (CCR) and Server – or Standby – Continuous Replication (SCR). Today, we’ll talk about CCR, and address LCR/SCR next week.
CCR allows an active node of a 2-node Active/Passive Exchange Cluster to replicate a copy its data to the passive node. This allows the passive node to take over with a nearly-current copy of the data if the production system fails due to hardware or software failure. There is a log replay lag to be considered, but it’s only 50 logs that need to be applied to the passive node during a rollover event, and that does give you some measure of protection against corruption if you catch it fast enough. Otherwise, the system acts much like a traditional Single Copy Cluster (formerly Shared Disk Cluster) in behavior, and is controlled with a combination of Windows cluster tools and PowerShell.
Whenever a log file is committed, and a new prime log (usually E00) is created, the closed log is copied over to the passive node via an SMB share, where it is held until it passes the 50 log replay limit and is then committed to the database, or a rollover occurs and the logs are committed immediately. Exchange 2010 will move away from the SMB share, but will utilize a similar methodology overall, if the beta is to be taken at face value.
In order to get the passive node in sync with the active data, the CCR system starts with a re-seed operation. All data from the database is copied from the active node to the passive node, as well as any non-truncated logs. From then on, only log files are copied, as they are committed on the active node. If all goes well, this will probably be the only re-seed you see unless you have a rollover.
If you do flip nodes – let’s say from Node A to Node B – then Node B will re-seed back to Node A if Node A becomes divergent. In other words, if Exchange cannot determine what logs still exist on Node A, or if the logs are inconsistent, or if some are missing. A graceful rollover will not cause a re-seed, but most emergency rollovers will require it.
The same will happen if you haven’t rolled over, but instead Node B was offline for some other reason. When Node B comes back online, Node A will see if all the required logs are on both machines, and then either just continue CCR protection or else initiate a re-seed to copy the data over again if anything is amiss. The only issue here is if your backup tools purge logs while Node B is still offline. In that case the servers will be considered divergent and need a re-seed to get back up and running properly.
Finally, if a cluster is restored from a backup (tape or otherwise) to the active node, then a re-seed must be manually initiated to re-sync the nodes properly. You will see errors telling you to do this after the restore is complete and you bring Node A back online.
One other condition exists, but it is a manually created condition. If you perform Offline Defragmentation of the database, you will trigger a re-seed operation when Node A is brought back online. As long as the first Exchange log is still present (which it should be) then this will happen automatically. Otherwise, it will need to be initiated manually.
So, why is this an issue? Normally, it’s not, but keep in mind that re-seed operations are *full* copies of the entire database. So if you have relatively small databases and only a few of them, this isn’t a problem. But let’s say you have over 1 Terabyte of data in your Exchange cluster. Re-seeding that much data locally will be time and resource consuming, and doing it over a WAN (for distributed failover clustering) could be problematic – to say the least. So you want to avoid re-seed operations at all costs and wherever possible, which means treating the CCR cluster very carefully, and following all the best practices from Microsoft on Exchange 2007 Clustering in general.
For information on when re-seeds occur, take a look at this TechNet article. They’re not an everyday occurrence, but you will need to be sure you know when and why they will happen to avoid confusion and frustration.
Labels: CCR, Exchange 2007, Exchange 2010, Failover Cluster, MSCS