Tuesday, February 12, 2019

BGP Help!

Hello! I'm managing a customer network that uses 4 BGP Peers across 2 different WAN links. The 4 peers connect to 2 different data center devices. Recently, we just lost several sites (maybe 5-10 out of 300) on one of the peers. All the effected sites are to the same data center router across the second WAN link (CradlePoint). The primary WAN1 connection to that same data center is just fine.

I've tried some debugging and I compared the config the best I could but am honestly not seeing any issues. Can someone take a look and see if they can see what I'm missing? (changed IPs for security concerns)

I think the issue could be this "tcb not available"

BGP Config: (at the Spoke site)

router bgp 12345

bgp router-id 10.0.99.99

bgp log-neighbor-changes

neighbor 10.0.100.251 remote-as 100

neighbor 10.0.100.251 local-as 100

neighbor 10.0.100.251 fall-over bfd multi-hop

neighbor 10.0.100.253 remote-as 100

neighbor 10.0.100.253 local-as 100

neighbor 10.0.100.253 fall-over bfd multi-hop

neighbor 10.0.200.251 remote-as 101

neighbor 10.0.200.251 local-as 101

neighbor 10.0.200.253 remote-as 101

neighbor 10.0.200.253 local-as 101

!

address-family ipv4

network 10.12.252.0 mask 255.255.252.0

network 172.31.94.128 mask 255.255.255.224

neighbor 10.0.100.251 activate

neighbor 10.0.100.251 weight 100

neighbor 10.0.100.251 soft-reconfiguration inbound

neighbor 10.0.100.253 activate

neighbor 10.0.100.253 weight 95

neighbor 10.0.100.253 soft-reconfiguration inbound

neighbor 10.0.200.251 activate

neighbor 10.0.200.251 weight 90

neighbor 10.0.200.251 soft-reconfiguration inbound

neighbor 10.0.200.253 activate

neighbor 10.0.200.253 weight 85

neighbor 10.0.200.253 soft-reconfiguration inbound

distribute-list prefix default-route in

exit-address-family

Debugs: debug ip ipv4 and debut ip nat translation

TCP special event debugging is on

router#term mon

router#

Feb 12 16:25:59 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:25:59 PST: Released port 13562 in Transport Port Agent for TCP IP type 1 delay 240000

Feb 12 16:25:59 PST: TCP0: state was SYNSENT -> CLOSED [13562 -> 10.0.200.251(179)]

Feb 12 16:25:59 PST: TCB 0x3F7ADE1C destroyed

Feb 12 16:25:59 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding

Feb 12 16:25:59 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 10240ms (35000ms max, 60% jitter)

Feb 12 16:25:59 PST: BGP: ses global 10.0.200.251 (0x3F46606C:0) act Reset (Active open failed).

router#

Feb 12 16:25:59 PST: BGP: 10.0.200.251 active went from Active to Idle

Feb 12 16:25:59 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running

Feb 12 16:25:59 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running

router#

Feb 12 16:26:09 PST: BGP: 10.0.200.251 active went from Idle to Active

Feb 12 16:26:09 PST: BGP: 10.0.200.251 open active, local address 10.0.146.245

Feb 12 16:26:09 PST: tcp_uniqueport: using ephemeral max 65535

Feb 12 16:26:09 PST: Reserved port 45519 in Transport Port Agent for TCP IP type 1

Feb 12 16:26:09 PST: TCB4101600C getting property TCP_STRICT_ADDR_BIND (19)

Feb 12 16:26:09 PST: TCP0: Connection to 10.0.200.251:179, advertising MSS 1336

Feb 12 16:26:09 PST: TCP0: state was CLOSED -> SYNSENT [45519 -> 10.0.200.251(179)]

router#

Feb 12 16:26:11 PST: BGP: topo global:IPv4 Unicast:base Scanning routing tables

Feb 12 16:26:11 PST: BGP: topo global:VPNv4 Unicast:base Scanning routing tables

Feb 12 16:26:11 PST: BGP: topo att-mpls:VPNv4 Unicast:base Scanning routing tables

Feb 12 16:26:11 PST: BGP: topo inet:VPNv4 Unicast:base Scanning routing tables

Feb 12 16:26:11 PST: BGP: topo global:IPv4 Multicast:base Scanning routing tables

Feb 12 16:26:11 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:26:11 PST: 10.0.146.245:45519 <---> 10.0.200.251:179 congestion window changes

Feb 12 16:26:11 PST: cwnd from 1336 to 1336, ssthresh from 65535 to 2672

router#

Feb 12 16:26:11 PST: TCP0: timeout #1 - timeout is 4000 ms, seq 3478527092

Feb 12 16:26:11 PST: TCP: (45519) -> 10.0.200.251(179)

router#

Feb 12 16:26:15 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:26:15 PST: TCP0: timeout #2 - timeout is 8000 ms, seq 3478527092

Feb 12 16:26:15 PST: TCP: (45519) -> 10.0.200.251(179)

router#

Feb 12 16:26:16 PST: %FW-6-DROP_PKT: Dropping udp session 10.12.252.132:137 172.16.135.87:137 on zone-pair storenet-bcfcorpnet class class-default due to DROP action found in policy-map with ip ident 56898

router#

Feb 12 16:26:23 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:26:23 PST: TCP0: timeout #3 - timeout is 16000 ms, seq 3478527092

Feb 12 16:26:23 PST: TCP: (45519) -> 10.0.200.251(179)

router#

Feb 12 16:26:39 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:26:39 PST: Released port 45519 in Transport Port Agent for TCP IP type 1 delay 240000

Feb 12 16:26:39 PST: TCP0: state was SYNSENT -> CLOSED [45519 -> 10.0.200.251(179)]

Feb 12 16:26:39 PST: TCB 0x4101600C destroyed

Feb 12 16:26:39 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding

Feb 12 16:26:39 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 14336ms (35000ms max, 60% jitter)

Feb 12 16:26:39 PST: BGP: ses global 10.0.200.251 (0x401249A4:0) act Reset (Active open failed).

router#

Feb 12 16:26:39 PST: BGP: 10.0.200.251 active went from Active to Idle

Feb 12 16:26:39 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running

Feb 12 16:26:39 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running

router#

Feb 12 16:26:53 PST: BGP: 10.0.200.251 active went from Idle to Active

Feb 12 16:26:53 PST: BGP: 10.0.200.251 open active, local address 10.0.146.245

Feb 12 16:26:53 PST: tcp_uniqueport: using ephemeral max 65535

Feb 12 16:26:53 PST: Reserved port 11563 in Transport Port Agent for TCP IP type 1

Feb 12 16:26:53 PST: TCB3A067D18 getting property TCP_STRICT_ADDR_BIND (19)

Feb 12 16:26:53 PST: TCP0: Connection to 10.0.200.251:179, advertising MSS 1336

Feb 12 16:26:53 PST: TCP0: state was CLOSED -> SYNSENT [11563 -> 10.0.200.251(179)]

router#

Feb 12 16:26:55 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:26:55 PST: 10.0.146.245:11563 <---> 10.0.200.251:179 congestion window changes

Feb 12 16:26:55 PST: cwnd from 1336 to 1336, ssthresh from 65535 to 2672

Feb 12 16:26:55 PST: TCP0: timeout #1 - timeout is 4000 ms, seq 650223779

Feb 12 16:26:55 PST: TCP: (11563) -> 10.0.200.251(179)

router#

Feb 12 16:26:59 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:26:59 PST: TCP0: timeout #2 - timeout is 8000 ms, seq 650223779

Feb 12 16:26:59 PST: TCP: (11563) -> 10.0.200.251(179)

router#

Feb 12 16:27:07 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:27:07 PST: TCP0: timeout #3 - timeout is 16000 ms, seq 650223779

Feb 12 16:27:07 PST: TCP: (11563) -> 10.0.200.251(179)

router#

Feb 12 16:27:11 PST: BGP: Sched timer-wheel running slow by 1 ticks

Feb 12 16:27:11 PST: BGP: topo global:IPv4 Unicast:base Scanning routing tables

Feb 12 16:27:11 PST: BGP: topo global:VPNv4 Unicast:base Scanning routing tables

Feb 12 16:27:11 PST: BGP: topo att-mpls:VPNv4 Unicast:base Scanning routing tables

Feb 12 16:27:11 PST: BGP: topo inet:VPNv4 Unicast:base Scanning routing tables

Feb 12 16:27:11 PST: BGP: topo global:IPv4 Multicast:base Scanning routing tables

router#

Feb 12 16:27:21 PST: %FW-6-DROP_PKT: Dropping udp session 10.12.252.132:137 172.16.135.87:137 on zone-pair storenet-bcfcorpnet class class-default due to DROP action found in policy-map with ip ident 56901

router#

Feb 12 16:27:23 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:27:23 PST: Released port 11563 in Transport Port Agent for TCP IP type 1 delay 240000

Feb 12 16:27:23 PST: TCP0: state was SYNSENT -> CLOSED [11563 -> 10.0.200.251(179)]

Feb 12 16:27:23 PST: TCB 0x3A067D18 destroyed

Feb 12 16:27:23 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding

Feb 12 16:27:23 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 12288ms (35000ms max, 60% jitter)

Feb 12 16:27:23 PST: BGP: ses global 10.0.200.251 (0x214AC9F0:0) act Reset (Active open failed).

router#

Feb 12 16:27:23 PST: BGP: 10.0.200.251 active went from Active to Idle

Feb 12 16:27:23 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running

Feb 12 16:27:23 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running

router#

Feb 12 16:27:35 PST: BGP: 10.0.200.251 active went from Idle to Active

Feb 12 16:27:35 PST: BGP: 10.0.200.251 open active, local address 10.0.146.245

Feb 12 16:27:35 PST: tcp_uniqueport: using ephemeral max 65535

Feb 12 16:27:35 PST: Reserved port 42948 in Transport Port Agent for TCP IP type 1

Feb 12 16:27:35 PST: TCB42030CE8 getting property TCP_STRICT_ADDR_BIND (19)

Feb 12 16:27:35 PST: TCP0: Connection to 10.0.200.251:179, advertising MSS 1336

Feb 12 16:27:35 PST: TCP0: state was CLOSED -> SYNSENT [42948 -> 10.0.200.251(179)]

router#

Feb 12 16:27:37 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:27:37 PST: 10.0.146.245:42948 <---> 10.0.200.251:179 congestion window changes

Feb 12 16:27:37 PST: cwnd from 1336 to 1336, ssthresh from 65535 to 2672

Feb 12 16:27:37 PST: TCP0: timeout #1 - timeout is 4000 ms, seq 2064756798

Feb 12 16:27:37 PST: TCP: (42948) -> 10.0.200.251(179)

router#

Feb 12 16:27:41 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:27:41 PST: TCP0: timeout #2 - timeout is 8000 ms, seq 2064756798

Feb 12 16:27:41 PST: TCP: (42948) -> 10.0.200.251(179)

router#

Feb 12 16:27:49 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:27:49 PST: TCP0: timeout #3 - timeout is 16000 ms, seq 2064756798

Feb 12 16:27:49 PST: TCP: (42948) -> 10.0.200.251(179)

router#

Feb 12 16:28:05 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:28:05 PST: Released port 42948 in Transport Port Agent for TCP IP type 1 delay 240000

Feb 12 16:28:05 PST: TCP0: state was SYNSENT -> CLOSED [42948 -> 10.0.200.251(179)]

Feb 12 16:28:05 PST: TCB 0x42030CE8 destroyed

Feb 12 16:28:05 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding

Feb 12 16:28:05 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 7168ms (35000ms max, 60% jitter)

Feb 12 16:28:05 PST: BGP: ses global 10.0.200.251 (0x2145C8BC:0) act Reset (Active open failed).

router#

Feb 12 16:28:05 PST: BGP: 10.0.200.251 active went from Active to Idle

Feb 12 16:28:05 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running

Feb 12 16:28:05 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running

router#

Feb 12 16:28:11 PST: BGP: topo global:IPv4 Unicast:base Scanning routing tables

Feb 12 16:28:11 PST: BGP: topo global:VPNv4 Unicast:base Scanning routing tables

Feb 12 16:28:11 PST: BGP: topo att-mpls:VPNv4 Unicast:base Scanning routing tables

Feb 12 16:28:11 PST: BGP: topo inet:VPNv4 Unicast:base Scanning routing tables

Feb 12 16:28:11 PST: BGP: topo global:IPv4 Multicast:base Scanning routing tables

router#

Feb 12 16:28:12 PST: BGP: 10.0.200.251 active went from Idle to Active

Feb 12 16:28:12 PST: BGP: 10.0.200.251 open active, local address 10.0.146.245

Feb 12 16:28:12 PST: tcp_uniqueport: using ephemeral max 65535

Feb 12 16:28:12 PST: Reserved port 64922 in Transport Port Agent for TCP IP type 1

Feb 12 16:28:12 PST: TCB3FCB5D98 getting property TCP_STRICT_ADDR_BIND (19)

Feb 12 16:28:12 PST: TCP0: Connection to 10.0.200.251:179, advertising MSS 1336

Feb 12 16:28:12 PST: TCP0: state was CLOSED -> SYNSENT [64922 -> 10.0.200.251(179)]

router#

Feb 12 16:28:14 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:28:14 PST: 10.0.146.245:64922 <---> 10.0.200.251:179 congestion window changes

Feb 12 16:28:14 PST: cwnd from 1336 to 1336, ssthresh from 65535 to 2672

Feb 12 16:28:14 PST: TCP0: timeout #1 - timeout is 4000 ms, seq 1770928857

Feb 12 16:28:14 PST: TCP: (64922) -> 10.0.200.251(179)

router#

Feb 12 16:28:18 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:28:18 PST: TCP0: timeout #2 - timeout is 8000 ms, seq 1770928857

Feb 12 16:28:18 PST: TCP: (64922) -> 10.0.200.251(179)

router#

Feb 12 16:28:26 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:28:26 PST: TCP0: timeout #3 - timeout is 16000 ms, seq 1770928857

Feb 12 16:28:26 PST: TCP: (64922) -> 10.0.200.251(179)

router#

Feb 12 16:28:42 PST: TCP0: RETRANS timeout timer expired

Feb 12 16:28:42 PST: Released port 64922 in Transport Port Agent for TCP IP type 1 delay 240000

Feb 12 16:28:42 PST: TCP0: state was SYNSENT -> CLOSED [64922 -> 10.0.200.251(179)]

Feb 12 16:28:42 PST: TCB 0x3FCB5D98 destroyed

Feb 12 16:28:42 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding

Feb 12 16:28:42 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 12288ms (35000ms max, 60% jitter)

Feb 12 16:28:42 PST: BGP: ses global 10.0.200.251 (0x3A06B420:0) act Reset (Active open failed).

router#

Feb 12 16:28:42 PST: BGP: 10.0.200.251 active went from Active to Idle

Feb 12 16:28:42 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running

Feb 12 16:28:42 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running



No comments:

Post a Comment