Post-mortem: Emails from eBay
Scheduled on 2022-08-02 18:58:00
Estimated finish 2022-08-02 19:06:00
Recently we deployed entirely in-house DNS resolvers in an attempt to reduce reliance on third parties and further limit theoretical attack vectors against our customers. After this change went live, emails from eBay began to fail with errors that looked like this in our logs:
2022-07-31 00:01:53 SSL_write: (from mxphxpool2080.ebay.com [18.104.22.168]) syscall: Connection reset by peer
2022-07-31 00:01:53 SMTP connection from mxphxpool2080.ebay.com [22.214.171.124] lost while reading message data (header)
The cause of this was unclear during an extended investigation. Even throwing Exim into debug mode did not yield data as clear as the final solution suggested. After reverting the deployment of our in-house DNS resolvers, emails from eBay began flowing normally. The theory behind this is that somehow, in some way, Exim could not resolve a query promptly during the SMTP transactions. We could not manually reproduce any of this behavior, but the fact that this fix worked consistently still paints a fairly clear picture.
It seems that future changes to DNS will need to be tested more extensively, under a higher load, than our current staging environment provides. Remember: It's not DNS. There's no way it's DNS. It was DNS.
Related servers / services: