Hello! I'm managing a customer network that uses 4 BGP Peers across 2 different WAN links. The 4 peers connect to 2 different data center devices. Recently, we just lost several sites (maybe 5-10 out of 300) on one of the peers. All the effected sites are to the same data center router across the second WAN link (CradlePoint). The primary WAN1 connection to that same data center is just fine.
I've tried some debugging and I compared the config the best I could but am honestly not seeing any issues. Can someone take a look and see if they can see what I'm missing? (changed IPs for security concerns)
I think the issue could be this "tcb not available"
BGP Config: (at the Spoke site)
router bgp 12345
bgp router-id 10.0.99.99
bgp log-neighbor-changes
neighbor 10.0.100.251 remote-as 100
neighbor 10.0.100.251 local-as 100
neighbor 10.0.100.251 fall-over bfd multi-hop
neighbor 10.0.100.253 remote-as 100
neighbor 10.0.100.253 local-as 100
neighbor 10.0.100.253 fall-over bfd multi-hop
neighbor 10.0.200.251 remote-as 101
neighbor 10.0.200.251 local-as 101
neighbor 10.0.200.253 remote-as 101
neighbor 10.0.200.253 local-as 101
!
address-family ipv4
network 10.12.252.0 mask 255.255.252.0
network 172.31.94.128 mask 255.255.255.224
neighbor 10.0.100.251 activate
neighbor 10.0.100.251 weight 100
neighbor 10.0.100.251 soft-reconfiguration inbound
neighbor 10.0.100.253 activate
neighbor 10.0.100.253 weight 95
neighbor 10.0.100.253 soft-reconfiguration inbound
neighbor 10.0.200.251 activate
neighbor 10.0.200.251 weight 90
neighbor 10.0.200.251 soft-reconfiguration inbound
neighbor 10.0.200.253 activate
neighbor 10.0.200.253 weight 85
neighbor 10.0.200.253 soft-reconfiguration inbound
distribute-list prefix default-route in
exit-address-family
Debugs: debug ip ipv4 and debut ip nat translation
TCP special event debugging is on
router#term mon
router#
Feb 12 16:25:59 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:25:59 PST: Released port 13562 in Transport Port Agent for TCP IP type 1 delay 240000
Feb 12 16:25:59 PST: TCP0: state was SYNSENT -> CLOSED [13562 -> 10.0.200.251(179)]
Feb 12 16:25:59 PST: TCB 0x3F7ADE1C destroyed
Feb 12 16:25:59 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding
Feb 12 16:25:59 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 10240ms (35000ms max, 60% jitter)
Feb 12 16:25:59 PST: BGP: ses global 10.0.200.251 (0x3F46606C:0) act Reset (Active open failed).
router#
Feb 12 16:25:59 PST: BGP: 10.0.200.251 active went from Active to Idle
Feb 12 16:25:59 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
Feb 12 16:25:59 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
router#
Feb 12 16:26:09 PST: BGP: 10.0.200.251 active went from Idle to Active
Feb 12 16:26:09 PST: BGP: 10.0.200.251 open active, local address 10.0.146.245
Feb 12 16:26:09 PST: tcp_uniqueport: using ephemeral max 65535
Feb 12 16:26:09 PST: Reserved port 45519 in Transport Port Agent for TCP IP type 1
Feb 12 16:26:09 PST: TCB4101600C getting property TCP_STRICT_ADDR_BIND (19)
Feb 12 16:26:09 PST: TCP0: Connection to 10.0.200.251:179, advertising MSS 1336
Feb 12 16:26:09 PST: TCP0: state was CLOSED -> SYNSENT [45519 -> 10.0.200.251(179)]
router#
Feb 12 16:26:11 PST: BGP: topo global:IPv4 Unicast:base Scanning routing tables
Feb 12 16:26:11 PST: BGP: topo global:VPNv4 Unicast:base Scanning routing tables
Feb 12 16:26:11 PST: BGP: topo att-mpls:VPNv4 Unicast:base Scanning routing tables
Feb 12 16:26:11 PST: BGP: topo inet:VPNv4 Unicast:base Scanning routing tables
Feb 12 16:26:11 PST: BGP: topo global:IPv4 Multicast:base Scanning routing tables
Feb 12 16:26:11 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:26:11 PST: 10.0.146.245:45519 <---> 10.0.200.251:179 congestion window changes
Feb 12 16:26:11 PST: cwnd from 1336 to 1336, ssthresh from 65535 to 2672
router#
Feb 12 16:26:11 PST: TCP0: timeout #1 - timeout is 4000 ms, seq 3478527092
Feb 12 16:26:11 PST: TCP: (45519) -> 10.0.200.251(179)
router#
Feb 12 16:26:15 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:26:15 PST: TCP0: timeout #2 - timeout is 8000 ms, seq 3478527092
Feb 12 16:26:15 PST: TCP: (45519) -> 10.0.200.251(179)
router#
Feb 12 16:26:16 PST: %FW-6-DROP_PKT: Dropping udp session 10.12.252.132:137 172.16.135.87:137 on zone-pair storenet-bcfcorpnet class class-default due to DROP action found in policy-map with ip ident 56898
router#
Feb 12 16:26:23 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:26:23 PST: TCP0: timeout #3 - timeout is 16000 ms, seq 3478527092
Feb 12 16:26:23 PST: TCP: (45519) -> 10.0.200.251(179)
router#
Feb 12 16:26:39 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:26:39 PST: Released port 45519 in Transport Port Agent for TCP IP type 1 delay 240000
Feb 12 16:26:39 PST: TCP0: state was SYNSENT -> CLOSED [45519 -> 10.0.200.251(179)]
Feb 12 16:26:39 PST: TCB 0x4101600C destroyed
Feb 12 16:26:39 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding
Feb 12 16:26:39 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 14336ms (35000ms max, 60% jitter)
Feb 12 16:26:39 PST: BGP: ses global 10.0.200.251 (0x401249A4:0) act Reset (Active open failed).
router#
Feb 12 16:26:39 PST: BGP: 10.0.200.251 active went from Active to Idle
Feb 12 16:26:39 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
Feb 12 16:26:39 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
router#
Feb 12 16:26:53 PST: BGP: 10.0.200.251 active went from Idle to Active
Feb 12 16:26:53 PST: BGP: 10.0.200.251 open active, local address 10.0.146.245
Feb 12 16:26:53 PST: tcp_uniqueport: using ephemeral max 65535
Feb 12 16:26:53 PST: Reserved port 11563 in Transport Port Agent for TCP IP type 1
Feb 12 16:26:53 PST: TCB3A067D18 getting property TCP_STRICT_ADDR_BIND (19)
Feb 12 16:26:53 PST: TCP0: Connection to 10.0.200.251:179, advertising MSS 1336
Feb 12 16:26:53 PST: TCP0: state was CLOSED -> SYNSENT [11563 -> 10.0.200.251(179)]
router#
Feb 12 16:26:55 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:26:55 PST: 10.0.146.245:11563 <---> 10.0.200.251:179 congestion window changes
Feb 12 16:26:55 PST: cwnd from 1336 to 1336, ssthresh from 65535 to 2672
Feb 12 16:26:55 PST: TCP0: timeout #1 - timeout is 4000 ms, seq 650223779
Feb 12 16:26:55 PST: TCP: (11563) -> 10.0.200.251(179)
router#
Feb 12 16:26:59 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:26:59 PST: TCP0: timeout #2 - timeout is 8000 ms, seq 650223779
Feb 12 16:26:59 PST: TCP: (11563) -> 10.0.200.251(179)
router#
Feb 12 16:27:07 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:27:07 PST: TCP0: timeout #3 - timeout is 16000 ms, seq 650223779
Feb 12 16:27:07 PST: TCP: (11563) -> 10.0.200.251(179)
router#
Feb 12 16:27:11 PST: BGP: Sched timer-wheel running slow by 1 ticks
Feb 12 16:27:11 PST: BGP: topo global:IPv4 Unicast:base Scanning routing tables
Feb 12 16:27:11 PST: BGP: topo global:VPNv4 Unicast:base Scanning routing tables
Feb 12 16:27:11 PST: BGP: topo att-mpls:VPNv4 Unicast:base Scanning routing tables
Feb 12 16:27:11 PST: BGP: topo inet:VPNv4 Unicast:base Scanning routing tables
Feb 12 16:27:11 PST: BGP: topo global:IPv4 Multicast:base Scanning routing tables
router#
Feb 12 16:27:21 PST: %FW-6-DROP_PKT: Dropping udp session 10.12.252.132:137 172.16.135.87:137 on zone-pair storenet-bcfcorpnet class class-default due to DROP action found in policy-map with ip ident 56901
router#
Feb 12 16:27:23 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:27:23 PST: Released port 11563 in Transport Port Agent for TCP IP type 1 delay 240000
Feb 12 16:27:23 PST: TCP0: state was SYNSENT -> CLOSED [11563 -> 10.0.200.251(179)]
Feb 12 16:27:23 PST: TCB 0x3A067D18 destroyed
Feb 12 16:27:23 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding
Feb 12 16:27:23 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 12288ms (35000ms max, 60% jitter)
Feb 12 16:27:23 PST: BGP: ses global 10.0.200.251 (0x214AC9F0:0) act Reset (Active open failed).
router#
Feb 12 16:27:23 PST: BGP: 10.0.200.251 active went from Active to Idle
Feb 12 16:27:23 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
Feb 12 16:27:23 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
router#
Feb 12 16:27:35 PST: BGP: 10.0.200.251 active went from Idle to Active
Feb 12 16:27:35 PST: BGP: 10.0.200.251 open active, local address 10.0.146.245
Feb 12 16:27:35 PST: tcp_uniqueport: using ephemeral max 65535
Feb 12 16:27:35 PST: Reserved port 42948 in Transport Port Agent for TCP IP type 1
Feb 12 16:27:35 PST: TCB42030CE8 getting property TCP_STRICT_ADDR_BIND (19)
Feb 12 16:27:35 PST: TCP0: Connection to 10.0.200.251:179, advertising MSS 1336
Feb 12 16:27:35 PST: TCP0: state was CLOSED -> SYNSENT [42948 -> 10.0.200.251(179)]
router#
Feb 12 16:27:37 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:27:37 PST: 10.0.146.245:42948 <---> 10.0.200.251:179 congestion window changes
Feb 12 16:27:37 PST: cwnd from 1336 to 1336, ssthresh from 65535 to 2672
Feb 12 16:27:37 PST: TCP0: timeout #1 - timeout is 4000 ms, seq 2064756798
Feb 12 16:27:37 PST: TCP: (42948) -> 10.0.200.251(179)
router#
Feb 12 16:27:41 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:27:41 PST: TCP0: timeout #2 - timeout is 8000 ms, seq 2064756798
Feb 12 16:27:41 PST: TCP: (42948) -> 10.0.200.251(179)
router#
Feb 12 16:27:49 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:27:49 PST: TCP0: timeout #3 - timeout is 16000 ms, seq 2064756798
Feb 12 16:27:49 PST: TCP: (42948) -> 10.0.200.251(179)
router#
Feb 12 16:28:05 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:28:05 PST: Released port 42948 in Transport Port Agent for TCP IP type 1 delay 240000
Feb 12 16:28:05 PST: TCP0: state was SYNSENT -> CLOSED [42948 -> 10.0.200.251(179)]
Feb 12 16:28:05 PST: TCB 0x42030CE8 destroyed
Feb 12 16:28:05 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding
Feb 12 16:28:05 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 7168ms (35000ms max, 60% jitter)
Feb 12 16:28:05 PST: BGP: ses global 10.0.200.251 (0x2145C8BC:0) act Reset (Active open failed).
router#
Feb 12 16:28:05 PST: BGP: 10.0.200.251 active went from Active to Idle
Feb 12 16:28:05 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
Feb 12 16:28:05 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
router#
Feb 12 16:28:11 PST: BGP: topo global:IPv4 Unicast:base Scanning routing tables
Feb 12 16:28:11 PST: BGP: topo global:VPNv4 Unicast:base Scanning routing tables
Feb 12 16:28:11 PST: BGP: topo att-mpls:VPNv4 Unicast:base Scanning routing tables
Feb 12 16:28:11 PST: BGP: topo inet:VPNv4 Unicast:base Scanning routing tables
Feb 12 16:28:11 PST: BGP: topo global:IPv4 Multicast:base Scanning routing tables
router#
Feb 12 16:28:12 PST: BGP: 10.0.200.251 active went from Idle to Active
Feb 12 16:28:12 PST: BGP: 10.0.200.251 open active, local address 10.0.146.245
Feb 12 16:28:12 PST: tcp_uniqueport: using ephemeral max 65535
Feb 12 16:28:12 PST: Reserved port 64922 in Transport Port Agent for TCP IP type 1
Feb 12 16:28:12 PST: TCB3FCB5D98 getting property TCP_STRICT_ADDR_BIND (19)
Feb 12 16:28:12 PST: TCP0: Connection to 10.0.200.251:179, advertising MSS 1336
Feb 12 16:28:12 PST: TCP0: state was CLOSED -> SYNSENT [64922 -> 10.0.200.251(179)]
router#
Feb 12 16:28:14 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:28:14 PST: 10.0.146.245:64922 <---> 10.0.200.251:179 congestion window changes
Feb 12 16:28:14 PST: cwnd from 1336 to 1336, ssthresh from 65535 to 2672
Feb 12 16:28:14 PST: TCP0: timeout #1 - timeout is 4000 ms, seq 1770928857
Feb 12 16:28:14 PST: TCP: (64922) -> 10.0.200.251(179)
router#
Feb 12 16:28:18 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:28:18 PST: TCP0: timeout #2 - timeout is 8000 ms, seq 1770928857
Feb 12 16:28:18 PST: TCP: (64922) -> 10.0.200.251(179)
router#
Feb 12 16:28:26 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:28:26 PST: TCP0: timeout #3 - timeout is 16000 ms, seq 1770928857
Feb 12 16:28:26 PST: TCP: (64922) -> 10.0.200.251(179)
router#
Feb 12 16:28:42 PST: TCP0: RETRANS timeout timer expired
Feb 12 16:28:42 PST: Released port 64922 in Transport Port Agent for TCP IP type 1 delay 240000
Feb 12 16:28:42 PST: TCP0: state was SYNSENT -> CLOSED [64922 -> 10.0.200.251(179)]
Feb 12 16:28:42 PST: TCB 0x3FCB5D98 destroyed
Feb 12 16:28:42 PST: BGP: 10.0.200.251 open failed: Connection timed out; remote host not responding
Feb 12 16:28:42 PST: BGP: 10.0.200.251 Active open failed - tcb is not available, open active delayed 12288ms (35000ms max, 60% jitter)
Feb 12 16:28:42 PST: BGP: ses global 10.0.200.251 (0x3A06B420:0) act Reset (Active open failed).
router#
Feb 12 16:28:42 PST: BGP: 10.0.200.251 active went from Active to Idle
Feb 12 16:28:42 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
Feb 12 16:28:42 PST: BGP: nbr global 10.0.200.251 Active open failed - open timer running
No comments:
Post a Comment