Tuesday, February 11, 2020

Anyone else having intermittent 802.1x issues with windows 10 clients?

I've been losing years off my life over this mess. We're a full NAC(purple) shop, all edge ports have multiauth enabled. The authentication hierarchy is 802.1x->MAC auth->unregistered black hole. Not unlike a precocious child, these end systems all over the place will intermittently lose their 1x sessions and drop the network access until the interface is reset. I'm 100% certain this behavior is on the client end, but I'll be damned if I can find exactly what's causing it.

Typical setup is a voip phone(Cisco) with a PC daisy chained to it, however this behavior persists on direct connections too. Basically, it breaks down like this:

Two sessions become established when a PC is logged into, a 1x which takes priority, but it also establishes a MAC session tied to the NIC, which gets thrown into unregistered hellban. Multi-auth has to be on because of the phones, so a full setup will show a 1x session to the PC, a MAC session to the phone with voice policy, and a MAC session to the PC unregistered. This behavior with the sessions is typical and hasn't caused any problems before. All that being said, all endpoints have been pushed to windows 10, along with around a thousand pc's replaced with newer hardware, along with the OS upgrade.

At seemingly random intervals the 1x auth session is dropping, which reverts the port back to unregistered and kills the PC's network traffic until the client interface has a state change. I can see it clearly in the logs that the heartbeat between the NAC and client eventually fails from the client side. In simpler terms, the NAC asks the PC "are you still there" at a steady interval, but for reasons I cannot seem to figure out, the PC will stop answering. As designed, the NAC drops that 1x session after the PC stops answering. the PC's don't seem to want to re-authenticate after this happens and it sits in purgatory until the NIC changes state.

I've done packet captures from the PC port, the Uplink port on the switch and the interface from the NAC and can prove that this isn't any kind of network failure. I can't figure out for the life of me why these PC's stop answering NAC challenges. GTAC swears it is either OS power management configuration or drivers that need to be updated. I'm pushing the driver angle hard since most of what I have seen have drivers from Microsoft and not Intel. Manually installing drivers straight from Intel seems to lower the occurrence but not fully cure the problem.

Any ideas?



No comments:

Post a Comment