So I have and idea of whats going on with our issue I'm about to describe, but I'm out of my element on how to go about getting it resolved.
We have 2 carriers we peer with and advertise a /22
Att is one of the carriers.
Last night we turned up a new internet circuit with century link and shut off our us signal circuit.
Everything seemed fine following the cut, all our hosted sites were publicly accessible, various service we use seem to be fine as well. Well come this morning, and we can access webex. No ither reports have come in.
Doing some traceroutes, our last response comes from level 3.
We turned the us signal circuit back up, and we we're theb able to acces webex again.
Doing another traceroute, we take the same path, and the hop that we stopped reciving responses from turns out to be what looks like a peering point for level 3 and cisco systems.
For what ever reason we prefer a path out through the att connection rather than century link, which seems odd since century link owns level 3 now, but who know what the state of peering and bro oath selection looks like for that.
Anyways, it would appear to me that the cisco systems device does not have a valid path back to our prefix. I obviously have no way to see what their return path looks like either (unlesss they have a looking glass I'm not aware of)
Managment dosnt want me to take us down to just att or just century link for testing during buisness hours incase it causes more service disruptions.
Right now, we are running on all 3 carriers so that webex works.
I'm going to get a call in with century link, but the problem seems like who we really need to look at this is that cisco systems level3 hop owner.
Do I approach cisco and say, hey, your webex service seems to be having issues routing to our prefix, can you see what it lloks like on your end? Would I just call webex support?
I'm just not sure about who to contact to get this looked at properly and resolved. Ive never had to troubleshoot a specific service like this before, with multiple carriers involved. I'm not even sure how to go about finding out what carrier the return traffic may be trying to come back on.
Edit:
i forgot to mention that when we were "down", i could reach the destination if the ping/traceroute was sourced from the IP address we peer with either ATT or Centurylink on, but not when sourcing from our prefix. so its specific to how our prefix is advertised it would seem.
here is a truncated traceroute to take out identifying information
when it fails
8 5 ms 4 ms 4 ms 12.119.139.13
9 14 ms 14 ms 15 ms 12.123.35.130
10 15 ms 15 ms 14 ms cr1.cgcil.ip.att.net [12.122.152.37]
11 15 ms 15 ms 14 ms cgcil402igs.ip.att.net [12.122.133.161]
12 * * * Request timed out.
13 12 ms 12 ms 12 ms ae-2-3601.edge4.Chicago3.Level3.net [4.69.203.230]
14 * * * Request timed out.
when successful ( the extra hop is the router that terminates the other circuits hopping to the ATT terminating router)
9 13 ms 15 ms 14 ms 12.123.35.130
10 15 ms 21 ms 14 ms cr1.cgcil.ip.att.net [12.122.152.37]
11 14 ms 14 ms 14 ms cgcil402igs.ip.att.net [12.122.133.161]
12 * * * Request timed out.
13 12 ms 13 ms 12 ms ae-2-3601.edge4.Chicago3.Level3.net [4.69.203.230]
14 12 ms 12 ms 12 ms CISCO-SYSTE.edge4.Chicago3.Level3.net [4.53.98.74]
15 13 ms 13 ms 12 ms ord10-wxbb-crt01-bu60.webex.com [64.68.115.20]
16 33 ms 33 ms 33 ms iad02-wxbb-crt02-te0-6-0-1.webex.com [173.243.4.58]
17 30 ms 30 ms 30 ms iad02-wxbb-pe02-bu12.webex.com [64.68.117.194]
18 29 ms 29 ms 30 ms 64.68.118.55
19 33 ms 32 ms 33 ms iad02-wxp00-csw01-vl101.webex.com [64.68.115.101]
20 29 ms 29 ms 29 ms iad02-nebulaaa9.webex.com [64.68.105.103]
No comments:
Post a Comment