Wednesday, July 3, 2019

It looks like an ASIC issue ... And now I want to borrow an SFP module in Boston.

I've got four switches connected in a double-ended MLAG like this (not my drawing). Two switches from vendor A (A1 and A2) and two switches from vendor B (B1 and B2).

It's been running for many months without issue.

Recently, some flows have begun failing to traverse the some legs of the aggregation. I zeroed in on one problem flow, found the specific link carrying that flow. SPAN on the sending side shows the frames leaving. SPAN on the receiving side doesn't show the frames arriving. Other flows between the same IP pair (using different ports) are unaffected. Other flows traversing the same link are unaffected.

Error counters are not incrementing.

If I down the suspect LAG member, the problem flow hashes elsewhere and gets delivered just fine. Re-enable the link, the problem flow lands on it again, and doesn't survive the trip.

Both ends are Broadcom based: Trident2+ from vendor A, Trident+ from vendor B.

Because the two SPAN results don't agree, I'm leaning toward putting a tap on the link to get an independent opinion.

BUT... The links are made with CX-1 cables, so I can't tap 'em.

Ideas?

I've got SR transceivers I could use on one end, need some HPE 455883-B21 for the other end.

Anybody happen to have some of these at 50 Innerbelt in Somerville MA?



No comments:

Post a Comment