Monday, July 30, 2018

Cisco Firepower Rant

I started doing Cisco Firepower back in 2015 and after all those years I need to

blow off some steam. IMO it was a clunky solution when there was only the ASA + Firepower Services option, an attempt to go to market as quick as possible that felt weird since there was still ASA configuration via CLI/ASDM and Firepower configuration via FMC (or for the very brave ones out there Firepower via ASDM).

I wasn't a big fan of the solution and was quite excited when Cisco announced Firepower Threat Defense, which should have brought the ASA and Firepower technology into a single OS. I thought Cisco finally got a grip and my times with hour long incremental upgrade procedures and slow FMC UI were finally coming to an end with ASA and Firepower code merging into a single solution.

Looking back at that time I was really naive to believe they would re-engineer it to finally have a viable competitive solution against the Checkpoint's and Palo Alto's of the world. And holy shit was I disappointed when I got my hands on the platform during a migration from classic ASA to the new Firepower 4100 platform running FTD.

I will try to detail my criticism as accurate as possible, but please forgive me any technical errors. Everything I learned about the platform was basically reverse engineered since every time something was broken I had to dig deeper and deeper into this moloch of technologies.

FX-OS

Only four words are needed to describe the overall architecture... It is a mess.

First of all you need to understand that when buying a FPR2100/4100/9300 firewall you end up with a variety of stitched together technologies.

On the bottom you have FX-OS, the OS running beneath Firepower Threat Defense. FX-OS is basically Frankensteins Monster of the Cisco UCS platform. If you are familiar with Cisco's server platform, they are using Fabric Interconnects and FEX modules to integrate their servers more tightly into the network fabric and provide a centralized management platform using UCS Manager. FX-OS is exactly that. It is probably a fork of the software running on a UCS Mini (which integrates the Fabric Interconnect functionality into the FEXes of the blade chassis itself).

So apart from the firewall software you end up with another piece of software to maintain and update, that can have additional bugs and issues.

It isn't even a single technology, since it is running Cisco NX-OS and a forked UCS Manager (Firepower Chassis Manager) - but atleast there is only one upgrade package and updates mostly work.

Firepower Threat Defense

Firepower Threat Defense is the name of the "unified" image, the platform that should have made everything better, but imo it is a real disaster. Nothing really changed in comparison to ASA + Firepower Services. Back then you had asa code running on the ASA 5500-X series with a little VM running the firepower services on the same hardware. A service policy was used to flag traffic that should be forwarded to the firepower software module, where the traffic was analyzed, flagged and sent back to ASA to enforce an action. FTD is pretty much the same, but they got rid of the additional software/hardware module and just let the ASA code run directly within the firepower linux. Network traffic is being pumped through a shared memory segment with the good old ASA and Firepower code looking at the shared memory segment one after the other. The only real change was that they could get rid of the base operating system on which the asa binary was running and dump asa into the firepower linux.

When I first read about the "new" architecture I thought they were kidding, I couldn't believe that this was the result of years of engineering. It looked like a dirty hack to build a NGFW out of the technical debt of both Cisco and Sourcefire with no useful integration between the two platform. I was disappointed but tbh I didn't really care as long as performance, reliability and security was good, so I moved on hoped at least they got the management part right now.

Management

The management of the platform is the worst I ever encountered with any firewall on the market. Back when FTD was released with 6.0 the only possibility to manage it was Firepower Management Center. Up until today (2018/06) they still don't have feature parity with their local device management solution Firepower Device Manager, which looks fancy but is not even capable of configuring High Availability for firewalls... so apart from very small deployments it is completely useless.

Getting FDM out of the way let's focus on FMC. It is available both as virtual appliance and physical appliance and is the heart of every firepower installation. Since FTD does not support any CLI configuration (apart from a enabling/disabling features like HA and protocol inspection) everything must be done from FMC. And this is one of the major pain points I have with the solution.

If you lose connectivity between your Firewall and FMC you can't do any changes. What is even worse is that you need to connect FTD to FMC using its management interface, so in case you want to use it to manage branch offices that only have the FTD firewall and no other edge device that is capable of routing you are screwed. Until today there is no (viable) supported way for remote firewall deployments in case you don't have another router. Your only choice is to directly connect your firewall management port to the internet or stage your devices at HQ and send them to the remote location. If you ever screw up the configuration pushed from FMC to FTD you are basically fucked if connectivity is lost between the two devices, since you cannot revert the configuration.

Another big pain point is performance with FMC. Believe me, it doesn't matter which appliance you use. I have seen it from FMCv up to the largest hardware appliance, FMC4500, which should be able to support hundreds of devices. It is just horrible. If you try to search your connection events, be ready for minute long waiting times. The same goes for adding new devices, deploying firewall configuration and generating reports.

Deploying firewall configuration deserves it's own paragraph since it is the most terrific experience you will ever have with any firewall out there. After encountering > 15 different bugs, you will want to run away screaming. But why is it so bad? First of all... It's ridiculously slow. Depending on your configuration size it will take between 2 and 15 minutes (still applies even with 6.2.3.x, after Cisco proudly announced that performance is so much better now). It doesn't matter what change you make, FMC will generate the full firewall configuration and push it to the managed device. It is a real pain if you just add another interface and have to sit there for up to 15 minutes and wait for your changes to take effect. Just think about what that is like if anybody fucks up and adds incorrect configuration that causes an outage. You know how to fix it, but you must wait for the deployment procedure. Up until a few updates there wasn't even any diff/history feature to compare policy changes. You can only look into the audit log and check log entries one after the other to determine what has really changed. So in case somebody makes a change and you are not sure if that configuration should be applied you are basically fucked and cannot find out what exactly changed. Another massive issue is how the whole deployment procedure works. It generates both ASA + Firepower configuration and pushes it to the device. But what exactly does "generating ASA configuration" mean? I was up for another surprise when one day I checked the logs and found the it basically generates cli configuration and pushes it into the asa part of the firewall... And now guess what is different between the FMC UI and ASA CLI? Exactly, input validation... You enter configuration into FMC, think it is correct, since hey the fucking UI accepted my input and deploy it to the firewall... And now you are up for a big surprise. The rollback procedure from hell. If any of the configuration commands fail, the firewall will rollback the configuration by erasing (!) the running configuration and reloading the startup configuration... Now guess what happens to all your active sessions during that time. :)

They are gone, and you just caused an outage by applying your configuration. Back in older releases (< 6.2.3) there were also various issues with ACL compilation which resulted in ~10 minute downtime if the device had to "rollback" a large access control policy / ACL.

Programability

If you ever did a large migration of firewalls, had to audit rule sets or work with a fancy company that wants to automate their infrastructure you will want your enterprise gear to have feature parity between UI (CLI/GUI) and API. When it comes to firepower I was disappointed once again. First of all there is no API first approach to the product since both sourcefire and asa technology were pretty old, so there was no feature parity between UI and API. Even worse, There wasn't even a FMC API until version 6.1. I was excited when I heard that it will have a REST API and started writing scripts to audit rule sets and automate changes for some of my customers and holy fuck was I up for some surprises. I think I ran into 10 different bugs with adding / editing firewall rules which mostly ended with me not being able to open the access control policy from the UI anymore due to a bug. Then there were a shitload of undocumented issues where I had to decompile the REST API Java code to find out why perfectly valid API requests wouldn't work and found that the API required me to delete various fields I got via a GET request before using a PUT operation to update a rule. I got into very weird situation where I had to map out every possible bug that I could encounter that would destroy my policy and work around it (like not having identity objects, applications, url objects in my rules) etc. etc.

Troubleshooting

Ever wanted to become a Full Stack engineer? Firepower is exactly what you are looking for. Whenever something breaks you will have two choices. Open a TAC case and play ping pong with the support engineer, who will escalate to engineering after he finds that it is yet another bug or get into the dirty details of this "solution". Troubleshooting firepower is like troubleshooting a linux server running three different web servers, five different back ends and a shitload of databases. Since most issues are related to the management plane of the product you will end up tailing tons of ultra verbose application log files, that throw random errors all the time, look into perl code from 2002 to determine what is going on and ask yourself why the fuck some information is within the mysql database and other information is to be found in the sybase database or for some reason in that weird mongodb that was just added because of the TID feature.

It doesn't feel like troubleshooting a firewall, because the tooling is so random and sometimes even breaks the product itself. On various occasions I had TAC engineers using some on board scripts that broke things like HA between FMCs or destroy the management interface configuration of the firewall. Features like user identity are probably the most fun ones to troubleshoot. Before 6.2.x there wasn't

even an official way to check if a firewall knew the correct user to ip mappings, so you had to write a SQL query to get that information out of the database running on the firewall.

Long story short - troubleshooting firepower is weird and without knowing the exact system architecture you will feel lost pretty quickly.

Software Reliability / Quality

It might not come as a surprise that the quality of Firepower Threat Defense (or rather the whole firepower line) is beyond saving. The architecture is so fucked up that inevitable it will fail imo. Combining two legacy solutions into one package and not re-engineering any major part of the different products had to end like this. During my last three years of working with it I had to open about 85 (!) cases, and mind you I tried my very best to solve every possible issue by myself. At some point I didn't even bother anymore to report bugs or open cases, because the issues just kept coming. I did my fair share of working with engineering to reproduce bugs and really wanted this product to succeed, but even after all the promises by cisco to invest in software quality (for which they basically stopped the roadmap, because there were too many escalations) it is still a mess. It is not as bad Firepower 5.4 - 6.1.0, but there are still a ton of issues with features like FMC High Availability, FTD High Availability, FMC performance, FMC REST API, etc. etc. and I feel like they would have to start from zero to produce anything good.

---

I am normally not that bashful, but this product has stolen so much time from me and I don't want anybody else to go through this shit. I know this post is very long, but believe me I could go on for many more pages about all the issues with firepower and why I think it will never get to a point where it is competitive.

TL ; DR - Don't buy Cisco Firepower, it's not worth it



No comments:

Post a Comment