Thursday, June 27, 2019

Redundant ISP Network Design Sanity Check

Hi Everyone,

I am hoping to get a sanity check on a redundant internet design I am putting in place. I have been waffling back and forth about the best method(s) and decided I should try to get some feedback from others. Let me preface by saying my current role is not 100% network engineering (hasn't been in about 5 years), so I may be overlooking some things.

The hardware involved internally is 2x Nexus 3Ks as "cores" with 2x Palo Alto firewalls in an HA pair. I added 2x Catalyst 2960s as "internet" switches yesterday which I will explain more about. I have included a diagram of the current design in place at the end of the post.

The N3Ks are connected via a vPC so I can span my physical port channel ports between the two cores. My initial design was to have one ISP link connect to the first core on VLAN 100, and the backup ISP connect to the 2nd core on VLAN 200, this way I could come out of the N3Ks to the firewalls with a bunch of redundant LACP links (connected as LACP aggregate groups on the PA firewalls). I was feeling skeptical about connecting the ISP links to the core switches, even if it was just layer 2 that technically has no access to anything else. But, the cores hold all the layer 3 HSRP SVIs and port channels to all the internal switches, and it felt like a security risk that would cause shifty, wide eyes if anyone audited it later on.

So, I redesigned things a bit, the end result looking like the diagram in the link at the bottom. I added a pair of C2960s as "internet" switches as shown in the diagram. These are trunked, not stacked, as I am trying to keep things as independent as possible to reduce single points of failure. I have had stacks entirely seize on me in the past (don't know if that really happens anymore). Since they aren't stacked, I can't span physical port-channel ports between the C2960s "properly." They are currently connected to the PAs as non-LACP aggregate groups. I will say that everything is working fine, but I am getting a lot of port flapping on the C2960s as the aggregate groups seem to be bouncing between internet switches, but there is no packet loss and the sessions don't seem to have any issues. I have also simulated failures by disconnecting links and it all "works." I did try single link LACP port channels between the C2960s and the PA firewalls. This also "worked" but the aggregate groups on the PAs would never be "fully up" since it can't negotiate one LACP aggregate group to two different physical switches with different IDs. Failover also worked, but it took quite a while for LACP to renegotiate on failures, so this isn't ideal.

The long story short is that the current design is working, but the port flapping is bothering me. Am I being overly cautious by not connecting the ISPs to the cores on layer-2? From a technical perspective that design works just the way I want. Should I just stack the C2960s and do proper LACP port channels and risk a possible stack failure? Am I overlooking something obvious? Any feedback would certainly be appreciated.

Current Design: https://imgur.com/EMU5Mtq



No comments:

Post a Comment