Monday, September 23, 2019

Routing not working as expected over VPC

I've hit a strange issue with routing in one part of my DC and I'm struggling to see what I've done wrong.

I can ping a device consistently with no issues but if I try to SSH to it from the same source it does not work. SSH from a different location that doesn't rely on the same path it works fine.

I apologise in advance for how crude my diagrams are, I have no access to reddit other than on my phone.

The network is not what I would like in this part of the DC as it is connected to an existing environment to allow for a migration that will take place soon. I had to fit the new network to connect with the existing as no major changes could be made on the existing. Otherwise everything is working beautifully. I have a spine and leaf network using 9Ks and eBGP following this rfc for guidance. https://tools.ietf.org/html/rfc7938#section-5.2 The section of the network I'm describing is intended for external connections that could be presented in a number of different ways. I designed the external environment to be as redundant as possible but also flexible.

Topologies Layer 2 connection to existing network

ASR1002X-1 --VPC-- NEXUS3K-1 --VPC-- 4500X-VSS

ASR1002X-2 --VPC-- NEXUS3K-2 --VPC-- 4500X-VSS

Layer 2 connection to new network.

ASR1002X-1 --VPC-- NEXUS3K-1--P2P--Palo Alto FW

ASR1002X-2 --VPC-- NEXUS3K-2--P2P--Palo Alto FW

The ASRs have one cable to each 3K in a port-channel. On the 3K side these are configured as port-channel with VPC. Between the 3Ks and the 4500X there is a VPC port-channel again. The 3Ks are connected to each other with 2 cables. Between the 3Ks and the FW each switch has a link to each FW with P2P routing. The FWs then connect to the rest of the new network.

Layer 3 ASR1002X --OSPF-- 4500X-VSS --- I would have preferred BGP here but I couldn't implement it.

ASR1002X --BGP-- NEXUS3K --BGP-- FW

Each ASR is connected to the 4500X using ospf. The 4500x is sharing the existing network routes with the ASR then those are redistributed into bgp and shared with the 3Ks which then share to the FW.

The problem

When I ping the 3Ks from a device connected to the 4500 I have no issue. However if I SSH to one of the 3Ks, number 2 on the diagram, my request times out and using a debug I can see the request never reaches the switch.

The other 3K has no issues and I can reach both from the new network without any problems.

When I run a trace from the 4500 to the 3K that isn't working the path is not what I expect or what the routing table tells me it should take.

Path seen in trace 4500X -- ASR-1 -- 3K-1 -- FW -- 3K-2

But it should take this path and when I check the routing table it agrees with me. 4500X -- ASR-1 -- 3K-2

I'm a bit lost at this point and am hoping someone has some more experience with VPC to help me out.

I was dropped into the deep end a bit with the new network build, it's much much bigger than anything I've ever been responsible for. I'm pretty happy with how most of it turned out but this one issue is really bugging me.

The one thing I am thinking of trying next is to implement a layer 3 connection between the 3Ks and peer with IBGP. The issue with this is I only have the one set of cables currently running between them which is used for the VPC peer link and Cisco documentation says to not run layer 3 over this.

https://www.cisco.com/c/en/us/support/docs/ip/ip-routing/118997-technote-nexus-00.html



No comments:

Post a Comment