Hey this is a bit long winded but I'm looking for some insights into what might have happened here and if anyone has run into this before.
On Sunday the SFR module on our Primary/Active 5545 died and our secondary took over. We left the two firewalls in an HA pair Monday and yesterday we received the RMA for the from Cisco. Our plan was to remove the non-working ASA, change the working one to Primary and insert the new RMA one as Secondary.
We got as far as completely removing the broken one and making the existing one primary (failover lan unit primary) when we completely lost all internal to external access. Internal - internal still worked but internal - outside did not. Wireless, ethernet, servers, data (user) networks. All public facing websites were down as well.
Our Setup:
Behind our firewalls we have two Catalyst 9500's stacked which act as our core switches - access layer feeds off of this.
Connected to the 9500's are two VPC linked Nexus 93240's for 10/40G connections to servers.
Finally, behind those sit the Nexus 9348's which handle 1G connections and a lot of management access ports.
Here's a detailed list of what we did:
-
5:48 PM - Unhooked cabling from broken firewall (broke HA pair)
-
5:50 PM - Issued command on working firewall (failover lan unit primary)
-
5:50ish PM - Lost all internal - external internet access
-
6PM - Checked routes, verified config on existing firewall wasn't missing any configuration or had its configuration changed
-
The firewall could ping out (8.8.8.8). The 9500 couldn't. We could ping between the 9500 and the firewall. Something seemed off, checked routes again, nothing out of the ordinary or missing that we could tell.
-
6PM - started phone call to TAC
-
6:20-30PM - On the phone with TAC, Started wondering if it was ARP related between the firewall and the 9500's. Shut the interfaces on the 9500's to the firewall, didn’t seem to help. Rebooted firewall. Didn't seem to help. Still can't ping 8.8.8.8 from 9500's.
-
7pm - On the phone with TAC, they are looking into the switch side of things (9500/93240) we try pinging from the 9500 to 8.8.8.8 and it works. No changes were made. However, webpages hosted internally on servers that come off the 93240s cannot be seen from a 4G connection (external). So something is still wrong. TAC is seeing nothing wrong on the switch.
-
8PM - We switch to TAC firewall engineer. He see’s nothing wrong. At this point we may have gone down several rabbit holes with TAC (switch and firewall side) but over the next 2-3 hours certain sites and services began to work again. In one case, a server that hosts two public facing sites had issues (one site was available by 4G, the other was not). Different IP for each site, but still.
-
11PM – Everything is working. Internal – External is back up, public facing sites are now available.
-
This morning everything is working fine - we plan to put the other firewall in place this Saturday.
Cisco TAC is as baffled as we are as to why things werent working.
No comments:
Post a Comment