Friday, December 7, 2018

Losing my mind over Apple Push Notification delays that appear network-related.

I'm really at the end of my rope trying to troubleshoot APNs delays in our network.

Situation: iPhones (iOS 12, model appears irrelevant) connected to Cisco lightweight access points, piped to the internet through a Meraki firewall. These phones are Wi-Fi only (no cellular sticks).

Problem: if the iPhone is idle for 5 minutes, push notifications stop coming into the phone, and are delayed by minutes/forever. This can be easily duplicated using iMessage - it's very repeatable. Join phone to Wi-Fi > send iMessage > message received immediately > wait 5 mins (phone can be awake or asleep) > send iMessage > phone will not receive the push.

This sounds immediately like it's either a wireless connection problem, or an iPhone problem. However: this absolutely never happens when connected to a home network. 3 developers have tried it, I've tried it. Connected to my home network, push notifications come in reliably, 100% of the time within a second or two of sending it. Bring my home Wi-Fi router into work, connect the phone to it while plugged into the network here... delays again.

What I've tried (some of these are nonsensical, but I'm out of logical options):

  • Logged all messages coming in and out of the wireless LAN controller. Looking for evidence that the device is somehow disassociating from the AP during this time. Definitely not happening - everything looks happy.
  • Setting up different authentications on WLANs (PSK/no auth instead of our stock 802.1x). No change.
  • Tried connecting devices to an autonomous AP instead of the lightweight APs. No change.
  • Ran constant pings from both the firewall and from other network devices to the phones. The goal was to both keep the radio(s) awake, and ensure that all the applicable address tables would be populated (God knows why that would affect this problem, but I'm desperate here). No change.
  • Ran a packet capture at the firewall looking at what the traffic looks like. I didn't really expect to find much here, but as expected, a "good" notification has the source device sending APNs up to a 17.0.0.0/8 network and the recipient device getting traffic from that network, and a missed notification has the source device sending the traffic and then... nothing on the recipient device.
  • Modified various timers to see if I could alter that 5 minute idle timeout - I changed the user idle timeout and the ARP timeout on the WLC, changed the MAC address aging time of the switch(es).

 

I'm honestly confused here. I'm struggling to determine why, at home, these devices never lose their APNs connection, but it just disappears on the work network. I'm wondering if it's the NAT possibly dropping the mapping, but unfortunately it looks like our Meraki firewall is rather opaque in its NAT processes.

I'd appreciate any insights/pointers that anyone has here. Thanks!



No comments:

Post a Comment