Monday, June 14, 2021

Weird as heck ping behaviour - VRFs, IOS-XR

I just ran in to the weirdest situation, wondering if anyone has seen this or can offer any insight as to a next step for troubleshooting.

I have two VRFs spread across two ASR-9901 routers, let's call them VRF CITY and VRF SCHOOL.

In VRF SCHOOL I have a bunch of networks - loopbacks, point to points, etc.

A bunch of stuff in 10.0.0.0/8

A bunch of stuff in 172.30.0.0/16

And a bunch of stuff in public - but for the sake of sanitization, let's call it: 172.16.0.0/16.

Everything used to be in the same VRF, but now some of the subnets within 172.16.0.0/16 have been allocated out to city customers so, those interfaces have been moved to VRF CITY.

All of the network gateways are on the ASRs. The ASRs are directly connected. MPLS LDP is running, sync'd to OSPF, etc. Like, from a routing perspective everything works great.

In the BGP process, configured under the SCHOOL VRF, on both ASRs, 172.16.0.0/16 is summarized.

aggregate-address 172.16.0.0/16 summary-only

Everything in 172.16.0.0/16 is in the SCHOOL VRF and there is OSPF running with a sham link between the ASRs - so all of the specific routes exist properly within the VRF.

All of this is pretty basic VRF stuff, right? Like, it's nothing fancy, redistribution is working correctly, routing is working correctly, label switching seems to be fine, etc.

If you source a ping from ASR 1 VRF CITY to ASR 2 VRF SCHOOL 10.0.0.0/8 space everything works.

If you source a ping from ASR 1 VRF CITY to ASR 2 VRF SCHOOL 172.16.0.0/16 space things do NOT work.

If you source a ping from ASR 1 VRF CITY to ASR *1* VRF SCHOOL 172.16.0.0/16 space things DO work.

So if you looked at that and said "OK so for whatever reason, the summary route is breaking something in MPLS between routers" I would completely agree with you EXCEPT:

If you source a ping from a test host, with the default gateway on ASR 1 in VRF CITY to ASR 2 VRF SCHOOL 172.16.0.0/16 space THINGS WORK PERFECTLY.

Does anyone know what might be going on here?

I don't think I've ever seen the default gateway as a source IP for ICMP fail, when hosts still have full connectivity. There are no ACLs. Nothing funky or special.

TLDR: GW to GW inter-router inter-VRF reachability broken, but HOST to GW inter-router inter-VRF reachability is not. ???



No comments:

Post a Comment