Friday, April 23, 2021

[Update] to Comcast finger-pointing: Comcast has a show-stopping firmware bug in some of their co-ax gateways if you use static IPs

Super TL;DR: Static IP customer with daily outages had problem solved with new model of modem

I have been fighting this with Comcast for almost a month now and was about ready to scream at them over the phone. The sheer incompetence and utter idiocy present in their "business" support is enough to make you lose all hope in humanity.

Anyway, on with the technical details. You may remember this thread in which I was trying to prove it's Comcast's fault. Here's what the customer has

  • Comcast "Business" Internet 150/20 (Real-world speeds 185/25)
  • Comcast Business Internet Gateway. Unfortunately, I no longer have the gateway so I don't know WHICH one it was, but it was either Model Numbers: DPC3939, DPC3941T, and TG1682G.
  • Gateway has a /29 of our static IPs provisioned on it. Five usable IPs for us
  • Gateway is NOT in bridge mode because Comcast does not support that with static IPs. It's in a hybrid passthrough mode that allows devices connected to it to use the public IP addresses
  • In this case, the gateway ran through a couple switches on a special VLAN to present itself to the firewalls, a pair of HA pfSense units

Much of the troubleshooting and technical details are in the linked thread above. However, there have been new developments.

After we continued to see regular drops and had to have the customer keep resetting the gateway, we hired an electrician to come inspect the cabling and make absolutely sure that everything was properly grounded. He tested the grounding thoroughly and said that it was correctly done, so it's not a grounding issue.

I plugged a laptop into the gateway directly and assigned its NIC one of our Public IP addresses. I installed EMCO Ping Monitor on it and had it monitor two external IP addresses and the gateway public IP address. Meanwhile PFSense is logging gateway quality and availability. Whenever there is an outage (which is about every 12-24 hours or so, but inconsistent), the following happens

  • T0 - The gateway IP address stops responding to pings
  • T0 - The external IP addresses stop responding to pings and Internet connectivity stops working
  • T+2 minutes - The gateway IP address begins responding to pings again
  • T+??? minutes - Anywhere between 2 and 6 hours later, the gateway suddenly begins passing traffic again. OR you can power cycle it and as soon as it reboots it begins working again

Meanwhile the customer is about ready to throw things since they keep losing their connection over and over again.

I got back on the phone with Comcast for what seems like the 100th time and explained about what was happening to the level 1 guys. Unhelpful as usual. "Sir please reboot your gateway" and all that. I politely demanded an escalation. They agreed. 4 hours later, "Level 2 troubleshooting" calls me back. I explain the whole situation again.

Level2 troubleshooting says "Lets replace your modem again with the same model"

At this point, I simply declined that troubleshooting option. I told the tech that that's not an acceptable fix because we've already had two of this same model modem and we're not willing to eat another hourlong outage during the day to have a tech do the swapout.

Instead I ask if there is another model of modem that would work. He says there is, the gateway they use for Gigabit service. After pressing him on it, he agrees to schedule a tech to come onsite and swap it in. The only window they have is 8-10am during business day. Customer begrudgingly agrees.

Tech comes onsite. When he arrives, he calls me and tells me that he's going to check signal levels (oh great. Thanks. That's been done literally 6 times at this point) and that he CANNOT swap in the alternate modem. I almost fell out of my chair. He says that this modem is in short supply and they're only allowed to use it for gigabit customers. I try to explain the issues we've been having but he's not having it. I ask for a supervisor number. I then call supervisor, who is surprisingly cool. I explain the whole situation to him, and he's actually pretty embarassed. He authorizes the different modem.

Tech unplugs old modem then swaps in new one. The new modem is defective. Won't power up. Holy shit.

He goes to his truck and gets the ONLY other new modem in the city, apparently. Plugs it in. It boots. He spends about 30 minutes on the phone getting the static IP block transferred. Once he's done, it reboots a few times for firmware updates, then settles in. Customer has been down 2 hours and is pissed but what're you gonna do?

It has been 5 days since then. There have been 0 outages. We are showing 99.987% successful ping response.

It was never an issue with the cable plant. It was never an issue with the grounding. It was never an issue with VLANs, or switching, or fiber, or anything that we were doing wrong. There appears to be a catastrophic firmware flaw with static IP address blocks on the XB3-model modems. And Comcast would never own it, but it seems pretty clear that that is the case. I do not know if this issue is specific to our area, or actually network-wide for Comcast.

I hate Comcast coax service. I hate it with a passion. I yearn for the day that a decent fiber carrier arrives in the area willing to sell 1000/1000 for less than a mortgage payment. This customer is going to pay WELL over $1000 per month for 1000/1000 beginning in a couple months. We (MSP) burned literally dozens of hours and phone calls and site visits and equipment on this issue and it truly WAS their fault.



No comments:

Post a Comment