Hi All,
Currently having an issue where an IPsec tunnel just keeps dropping for a few minutes maybe once or twice a night and its causing the client to ring everyday(client is monitoring a device in a remote location and gets alerts when the device is not contactable after 1min).
The tunnel is established between 2 Cisco ISR's. Is configured to be a hub-spoke topology. I believe the spoke to be the cause of the issue because when we switch the spoke over to 4G/LTE(instead of fibre) it doesn't drop at all.
Here's where it gets more complicated then your basic IPsec setup(well it is for me anyway) - The spoke is behind another ISR and that ISR is behind a Sophos Firewall. As below
SpokeISR > SiteISR > SophosFW(NAT) > - - Internet - - < HubISR
The Sophos is the only thing that is NATing traffic, the SiteISR is just routing without any NAT(has private IP between the Sophos and itself).
I've tried tuning (spokeISR) the NAT keepalives and the DPD settings but it hasn't made any difference what so ever. Not to mention i don't believe it would do anything anyway as its not NATing.
I have a feeling that the Sophos Firewall is the issue and I'm not sure how to prove it via logs or anything as of yet - i was going to try extending the NAT time on the Sophos to see if that worked - also entering some Static NAT, but the Sophos is already handling IPsec tunnels for itself and don't know if that would work.
Hoping someone with more understanding of IPsec tunnels and NAT would have an idea of why this might be happening or be able to point me in the right direction. I have posted the crypto section of each site to show what they are currently - Also the logs from the HUB when the dropout occurs - if there is anymore information i can provide please let me know :)
##Configs##
#HubISR#
crypto keyring <name>
pre-shared-key address 0.0.0.0 0.0.0.0 key <password>
crypto isakmp policy 90
encr aes 192
hash sha256
authentication pre-share
group 14
crypto isakmp invalid-spi-recovery
crypto isakmp keepalive 10 5 periodic
crypto isakmp nat keepalive 20
crypto isakmp profile <name>
description <name> for spoke routers
keyring <name>
match identity address 0.0.0.0
crypto ipsec transform-set rtpset esp-aes 256 esp-sha512-hmac
mode tunnel
crypto dynamic-map dynmap 10
set transform-set rtpset
set isakmp-profile <name>
crypto map <name> 10 ipsec-isakmp dynamic dynmap
#SpokeISR#
crypto isakmp policy 90
encryption aes 192
hash sha256
authentication pre-share
group 14
crypto isakmp key <password> address <Static-IP of HUB>
crypto isakmp invalid-spi-recovery
crypto isakmp keepalive 10 5 periodic
crypto isakmp nat keepalive 20
crypto ipsec transform-set <name> esp-aes 256 esp-sha512-hmac
mode tunnel
crypto map <name> 90 ipsec-isakmp
set peer <Static-IP of HUB>
set transform-set <name>
match address <ACL-name>
### LOGS from HUB ###
Jul 3 00:23:04.894: [Ident 800001F3]: state = Check Install SA Declare Success
Jul 3 00:55:51.009: ISAKMP-ERROR: (1859):DPD incrementing error counter (1/5)
Jul 3 00:55:56.010: ISAKMP-ERROR: (1859):DPD incrementing error counter (2/5)
Jul 3 00:56:01.009: ISAKMP-ERROR: (1859):DPD incrementing error counter (3/5)
Jul 3 00:56:06.010: ISAKMP-ERROR: (1859):DPD incrementing error counter (4/5)
Jul 3 00:56:11.010: ISAKMP-ERROR: (1859):DPD incrementing error counter (5/5)
Jul 3 00:56:11.010: ISAKMP-ERROR: (1859):Peer <SpokeISR IP> not responding!
Jul 3 00:56:11.011: ISAKMP-ERROR: (1859):deleting SA reason "P1 errcounter exceeded (PEERS_ALIVE_TIMER)" state (R) QM_IDLE (peer <SpokeISR IP>)