Designing High Availability with Microsoft Exchange Server 2010
- 7/15/2010
Availability Planning for Transport Servers
Within the Exchange organization, it is important to deploy multiple transport servers to provide message path redundancy. Deploying multiple Hub Transports in each Active Directory site automatically provides redundancy and load balancing for message delivery. Deploying multiple Edge Transport servers will also provide incoming and outgoing SMTP redundancy.
Shadow Redundancy
Exchange Server 2010 includes the shadow redundancy feature, which provides redundancy for messages for the entire time they are in transit. This is in addition to the transport dumpster. With one form of shadow redundancy, the message deletion from the transport queue is delayed until the transport server verifies that all of the next hops for that message have completed delivery. If any of the next hops fail before reporting successful delivery, the transport server resubmits the message for delivery to that next hop. If the next hop server does not support shadow redundancy, the message will be sent to the next hop and a shadow copy of the message will not be retained.
Shadow redundancy provides the following benefits:
It eliminates the reliance on the state of the transport server queues. If redundant message paths exist, the state of any transport server isn’t relevant. If a transport server fails, you can simply remove it from production without worrying about emptying its queues or losing messages currently in transit.
If maintenance needs to be performed on the transport server the server can be brought offline without the risk of losing messages in transit.
It reduces the need for hardware redundancy for transport servers for messages in transit.
It consumes less bandwidth than other forms of redundancy that create duplicate copies of messages on multiple servers. With shadow redundancy the only added network traffic is the discard status being communicated between transport servers.
It provides resilience and simplifies recovery from a transport server failure because messages still in transit within the Exchange organization are protected by the previous Exchange 2010 transport server.
One form of shadow redundancy is implemented by extending the SMTP protocol. These service extensions allow SMTP hosts to negotiate shadow redundancy support and communicate the discard status for shadowed messages.
The protocol implementation of shadow redundancy works between Exchange 2010 transport servers. In the following scenario, a message is sent from an Exchange 2010 mailbox out to the Internet from a Hub Transport through an Edge Transport server, as shown in Figure 11-15. In this case the message flow follows these stages:
Hub delivers the message to Edge1:
Hub opens an SMTP session with Edge1.
Edge1 advertises shadow redundancy support.
Hub notifies Edge1 to track discard status.
Hub submits the message to Edge1.
Edge1 acknowledges receipt of the message and registers Hub1 to receive discard information for the message.
Hub moves the message to the shadow queue for Edge1 and marks Edge1 as the primary server. Hub becomes the shadow server.
Edge1 delivers the message to the next hop:
Edge1 submits message to a third-party e-mail server.
The third-party e-mail server acknowledges the message’s receipt.
Edge updates the discard status for the message as delivery complete.
If the message is delivered successfully, when Hub queries Edge1 for discard status:
At end of each SMTP session with Edge1, Hub queries Edge1 for the discard status on messages previously sent. If Hub has not sent any other messages to Edge1, it will open an SMTP session with Edge1 to query for the discard status after five minutes and will fail over three failures or 15 minutes. This time can be configured using Set-TransportConfig with the ShadowHeartbeatTimeoutInterval parameter. The number of retries can be configured by running Set-TransportConfig -ShadowHeartbeatRetryCount.
Edge1 checks the local discard status and sends back the list of messages registered to Hub1 that have been delivered and then removes the discard information.
Hub deletes the delivered messages from its shadow queue.
If the message delivery fails, then Hub queries Edge1 for discard status and resubmits the message:
If Hub cannot contact Edge1, Hub resumes the primary role and resubmits the messages in the shadow queue to another available transport server, Edge2.
The resubmitted messages are delivered to Edge2, and the workflow starts from step 1.
FIGURE 11-15 Transport shadow redundancy
The Shadow Redundancy Manager (SRM) is the core component of a Transport server responsible for managing shadow redundancy. The SRM is responsible for maintaining the shadow server for all of its primary messages. The SRM is also responsible for maintaining the following information for all the shadow messages in its shadow queues:
Determining when the shadow server should take ownership of shadow messages, thus making it the primary server
Maintaining the list and checking primary server availability for each shadow message
Processing discard notifications from primary servers
Removing the shadow messages from the database once after receiving the discard notification
Sending the discard status to the shadow servers
Shadow redundancy does not require any sort of configuration. When multiple transport servers are deployed they will automatically negotiate the use of shadow redundancy. When multiple Hub Transport servers are deployed in each Active Directory site each e-mail message will exist in two places while in transit. Because each message exists in two locations you may consider deploying Hub Transport servers without RAID-protected disks because the in transit e-mail messages will exist on another server and not need to be recovered. It is not always advantageous to deploy transport servers without redundant storage for the message queue as shadow redundancy does not protect e-mail messages in the transport dumpster. In configurations with a multi-site DAG as well as others that consistently maintains a number of e-mail messages in the transport dumpster because of transaction log replication latency you should store the message queue on redundant storage to reduce the probability of losing transport dumpsters data. You can determine the number of items in the transport dumpster by viewing the Dumpster Item Count counter on the MSExchangeTransport Dumpster performance object using Performance Monitor or by trending this counter using a solution like Microsoft System Center Operations Manager.
To reduce the likelihood of a server failure causing a loss of e-mail, the Mailbox Submission service on a DAG member first attempts to load-balance submission requests across other Hub Transport servers in the same Active Directory site. If the Hub Transport role is installed on the DAG member and it cannot submit messages to any other Hub Transport server in the site, it will fall back to the local Hub Transport server.
Inbound E-mail Redundancy
Another form of shadow redundancy called delayed acknowledgement is used in scenarios when a transport server receives a message from a mail server that doesn’t support shadow redundancy. Rather than immediately confirming receipt of the message from the submitting service, it delays sending an acknowledgement until it has confirmed that the message has been successfully delivered.
For inbound e-mail delivery with Edge or Hub Transport servers, the typical way to provide redundancy is to use an MX record for each of the e-mail servers accessible for e-mail delivery. MX records are weighted records in DNS that point to the e-mail servers responsible for receiving mail for a domain. The MX records with a lower weighting will be attempted before higher-weighted records. Records that have the same weight will be load balanced. Using MX records to provide this redundancy is part of the way SMTP was designed, so this configuration is often sufficient. In some instances where large numbers of SMTP servers are deployed, you may choose to use network load balancing to have more control over the inbound SMTP traffic, but load balancing should never be used inside the Exchange organization or against the Default Receive Connector on each Hub Transport server. Load balancing and redundancy are built in to the transport service.