Ran into a strange problem recently where an Exchange 2016 server could not send mail to Office 365 via hybrid mail flow. What made this situation particularly strange is that other Exchange servers in the environment had no problem sending messages over the hybrid connection. On the problem server messages would get stuck in the queue and eventually time out.
The queues were filled with retries such as these.
451 4.4.0 Primary target IP address responded with: "421 4.2.1 Unable to connect." Attempted failover to alternate host, but that did not succeed. Either there are no alternate hosts, or deliver failed to all alternate hosts.
This message tells us that the server was unable to connect to Office 365. Unfortunately, it does not give us much detail beyond that. For that level of detail we need to enable logging on the SMTP send connector used to send mail to Office 365.
Turn up logging on the SMTP Send Connector
To enable logging on an send connector log into the Exchange Admin Center (EAC) and select the Mail Flow tab and Send Connectors sub tab. Double click the send connector named Outbound to Office 365 and select Verbose under the General tab. Click Save.
To perform this same action through the Exchange Management Shell (EMS) type the following command.
C:\> Set-SendConnector -Identity "Outbound to Office 365" -ProtocolLoggingLevel Verbose
While we waited for logging to generate some entries we also confirmed that we could successfully make a connection from the problem server to Office 365. For this task we confirmed that we could telnet over port 25 to Office 365 and send an email message. This confirmed two things. First that this server was not being blocked on outbound port 25. Second that this server could resolve and reach Office 365 servers.
Analyzing the SMTP logs
Once we retried a few of the messages in our queue we examined the SMTP logs. Like our telnet test we could see Exchange was making the initial connection to Office 365. By line 14 we could see our server was transmitting its SSL certificate. However, this was quickly followed with the error TLS negotiation failed with error NoCredentials. This indicated an issue with our certificate. From face value everything on the certificate looked correct.
Outbound to Office 365,<,"220 BL3FEO12EC057.mail.protection.outlook.com Microsoft ESMTP MAIL Service ready at Thu, 28 Feb 2017 00:00:00 +0000", Outbound to Office 365,>,EHLO exchangeservergeek.com, Outbound to Office 365,<,250-BL3FEO12EC057.mail.protection.outlook.com Hello [18.104.22.168], Outbound to Office 365,<,250-SIZE 157286400, Outbound to Office 365,<,250-PIPELINING, Outbound to Office 365,<,250-DSN, Outbound to Office 365,<,250-ENHANCEDSTATUSCODES, Outbound to Office 365,<,250-STARTTLS, Outbound to Office 365,<,250-8BITMIME, Outbound to Office 365,<,250 BINARYMIME, Outbound to Office 365,>,STARTTLS, Outbound to Office 365,<,220 2.0.0 SMTP server ready, Outbound to Office 365,*,,Sending certificate Outbound to Office 365,*,"CN=webmail.exchangeservergeek.com, OU=Exchange Server Geek, O=SuperTekBoy, L=Cincinnati, S=Ohio, C=US",Certificate subject Outbound to Office 365,*,"CN=DigiCert SHA2 Secure Server CA, O=DigiCert Inc, C=US",Certificate issuer name Outbound to Office 365,*,07A7C02B0DF809CAB510F58A270A7DE9,Certificate serial number Outbound to Office 365,*,CE3A3D779940A6855B53E2F69EF2DA4BC374D3EE,Certificate thumbprint Outbound to Office 365,*,autodiscover.exchangeservergeek.com,Certificate alternate names Outbound to Office 365,*,,TLS negotiation failed with error NoCredentials
We then attempted to match the thumbprint of the certificate (highlighted in blue) against the working servers. It did not match. We then compared it against the certificate on the problem server. It did not match either. To compound the issue the thumbprint from the logs was not found in either the EAC or EMS. In fact the thumbprint in Exchange Admin Center across all servers was the same. All servers also had that same certificate bound to SMTP. Yet somehow the thumbprint in the logs did not match.
Fixing TLS negotiation failed with error NoCredentials
What we found was that the problem server had two identical certificates with the same common name. One with the correct thumbprint and one with a thumbprint that matched the SMTP logs. We discovered this issue by loading up the Certificates MMC on the problem server.
To do this click Start and type MMC.
From the MMC console select the File menu followed by Add/Remove Snapin. Select Certificates and click Add. From the wizard select Computer account > Local computer > Finish. Click Ok.
Back on the console expand Certificates (Local Computer) > Personal > and select Certificates. Your duplicate is most likely here.
Determine which certificate is incorrect from the thumbprint. Right click on the duplicate and select Delete from the context menu. Click Yes to confirm.
As soon as we removed the incorrect duplicate certificate from the problem server (and retried the messages in the queue) mail started to flow. Its uncertain why Office 365 does not attempt to match the thumbprint rather than just the common name. Hopefully we will see this change in the future update to Office 365.
Have you run into this error? What was your solution? Drop a comment below or come join the conversation on Twitter @SuperTekBoy.