Thursday, April 22, 2021

TCP Black Hole

Recently I was trying to troubleshoot a Domain Controller at a branch office, with an IPSec S2S over ADSL connection refusing to replicate with a PDC Emulator at the Datacentre. Connectivity appeared good (I was RDP'd to it from the Datacentre DC), DNS appeared good. But still it refused. I spent ages scratching my head, going over and re-going over time, dns, ACLs, local firewalls, perimeter firewalls.

Then I started troubleshooting MTUs.

If I did a ping -l 1500 -f I got the expected "need to fragment" response. Same at 1490, 1480, 1470. Somewhere around 1450 I hit "request timed out", until I eventually dropped the test pings to 1410 when responses started coming back.

I dutifully set my problem DC MTU to 1438 bytes (1410+28), and it has now been replicating for 3 days solid.

What I'm trying to figure out though is, what would cause the "need to fragment" message to eventually be replaced by a "request timed out" before eventually finding the sweet spot of 1410?

My guess is that the "need to fragment" ICMP response is coming from my remote office router, and once I drop low enough to get past that it might be just getting dropped silently in our ISP's network?

Also, is 1410 an unusually low MTU?

I mean, I think I've fixed this issue, but I think I need to understand this better. I think this may be affecting multiple sites so this may be something I need to fix, ideally a fix in one place rather than getting all user's laptops and desktops reset to a lower MTU (very few of our sites have domain controllers)



No comments:

Post a Comment