Sunday, June 16, 2019

IP SLA convergence for eBGP route-map failover

Hello, I have an ASR-1001x with eBGP, we're in a remote area and occasionally get outages upstream from our ISPs, on their infrastructure, so BFD isn't really a solution, nor do our ISPs even offer it.

For starters primary ISP is much faster, the backup is for failover only, so it is slightly path prepended and lower local pref and for regular traffic everything works as desired and goes through primary. We are only receiving a default route from both ISPs.

I created an IP SLA to ping 5 destinations with a 20% threshold for down. As well I created a secondary route-map for the primary ISP that is very path prepended and low local pref. If I insert this manually into router BGP id, it works as desired, traffic starts going through the backup ISP.

However, when the SLA goes down and EEM is triggered, I'm still experiencing an outage, and it's as if I'm just waiting for the BGP timers to take over.

Here's an example of an outage from today:

Jun 16 2019 11:37:48.735 EDT: %TRACK-6-STATE: 12 ip sla 12 reachability Up -> Down Jun 16 2019 11:37:48.735 EDT: %TRACK-6-STATE: 15 ip sla 15 reachability Up -> Down Jun 16 2019 11:37:50.734 EDT: %TRACK-6-STATE: 11 ip sla 11 reachability Up -> Down Jun 16 2019 11:37:50.734 EDT: %TRACK-6-STATE: 13 ip sla 13 reachability Up -> Down Jun 16 2019 11:37:50.734 EDT: %TRACK-6-STATE: 14 ip sla 14 reachability Up -> Down Jun 16 2019 11:37:51.330 EDT: %TRACK-6-STATE: 10 list threshold percentage Up -> Down Jun 16 2019 11:37:52.017 EDT: %HA_EM-6-LOG: CIRCUIT-DOWN: Primary circuit outage - altering route-map to favour backup Jun 16 2019 11:37:52.019 EDT: %SYS-5-CONFIG_I: Configured from console by on vty9 (EEM:CIRCUIT-DOWN) 

This is what the EEM does:

event manager applet CIRCUIT-DOWN event track 10 state down ratelimit 600 action 1.0 cli command "enable" action 1.1 cli command "config t" action 2.0 cli command "router bgp ####" action 3.0 cli command "address-family ipv4" action 4.0 cli command "neighbor 1.1.1.1 route-map ISP_FAILOVER_IN in" action 4.1 cli command "neighbor 1.1.1.1 route-map ISP_FAILOVER_OUT out" action 5.0 syslog msg "Primary circuit outage - altering route-map to favour backup" action 6.0 cli command "end" action 7.0 cli command "clear ip bgp 1.1.1.1 soft" action 8.0 cli command "exit" 

Any ideas on where the issue may be? Like mentioned if I do what's in the EEM manually there is no outage and a tracert updates the path within a couple of seconds. Would it be better to do a default route?



No comments:

Post a Comment