Friday, November 29, 2019

Our ISPs never believe our SD-WAN

One of the coolest parts of SD-WAN is that it constantly monitors the health of our point-to-point tunnels including loss, latency, jitter, out of order packet percent, and MOS.

This is awesome because at the click of a button we can see a branch office is experiencing 30% packet loss outbound or having unacceptable jitter, etc. And it’s displayed in cool graphs and charts that are also easy for management to digest.

The problem is none of our ISPs ever believe a shred of it. It gets almost comically bad at times because like clock work as soon as we mentioned the SD-WAN they immediately get argumentative.

One example, our SD-WAN starts showing consistent packet loss in excess of 30% between a single branch office and our data center, in only one direction. Both locations have DIA Fiber from the same provider, in the same city. Pretty clear indication that something is wrong right?

We’re asked to provide some evidence of the problem we’re experiencing and we put a picture of the little graph in our SD-WAN orchestrator showing consistent one way packet loss between these two sites. Immediately: “nuh uh that’s wrong. That doesn’t mean anything.” It’s like as soon as SD-WAN is even mentioned they immediately shut down and get unhelpful. At one point we were even told “we’re an ISP we don’t drop packets.” And they try to tell us the circuit passed a test so that proves the problem isn’t them. We tell them no, that circuit is talking to 100 other sites with under 1% loss, the problem is only when these two sites talk directly to each other. We even run trace routes and reverse lookups and tell them “Look at Router AGG-RTR-XXX01.” They assure us “No, that’s impossible the problem is on your end and that’s that. Please open a ticket with your SD-WAN vendor.”

Fast forward like a month later and the loss magically disappears and the ticket gets stealth closed with no updates. Yeah sure they definitely didn’t find something on their end and fix it. rollseyes

In other case a bigger problem showed many down tunnels and huge loss all over the place and after investigation it looks like every problem is when a specific ISP A is trying to talk to a specific ISP B. We take our findings to the provider and same old story “you might want to call your SD-WAN vendor, because we don’t have anything like that going on.” Fast forward multiple escalation later and magically our ticket was linked to another Master Ticket and they’re bouncing ports and cleaning fiber at some NNI, and all the sudden everything goes back to normal after they resolve the Master Ticket.

I wonder why it is met with so much skepticism despite being battle tested?? I mean a lot of these ISPs are offering their own SD-WAN solutions too as a managed service, so they must believe in them?

Anyway my advice to anyone on here doing SD-WAN that has to bring up a ticket with a provider: Don’t mention SD-WAN as soon as you do they will stop taking you seriously. If you can, re-create the issue with other traditional tools and present that to them instead.

Ok I’m done ranting for now!



No comments:

Post a Comment