Tuesday, September 1, 2020

Cisco Firepower Rant III ( from a firepower TAC engineer )

This is a throw away account.

I am/will/was a TAC engineer in cisco.

Before i start my rant. Here is my free advice for firepower customers:

If you are a customer and want to get the proper support for firepower cases, try to open a case in the time zone when Krakow (EMEA) / Bangalore (APAC) / US team (NASA) works. They have best engineers ( for firepower atleast )

All other teams are sub par when it comes to firepower cases.

There are so many internal TAC things apart from the product that i can't just tolerate anymore.

Now coming to BU. ( or engineering team )

It all depends on the contract that you have with cisco. If you are a big customer ( you have paid them big money ) , you will get the best support. You case will be FTS around the clock. Dedicated TAC team / engineering team will monitor your cases 24/7. The engineering team will be fast in fixing your software bugs. you just need the right contacts and some stern e-mails to right people.

If you are a small customer the situation is different for you. Even if the TAC engineers tries to push the case to BU / engineering , they are very slow in responding to any new bugs. It makes sense till one point . I get it , you want to prioritise customers who pay you more , but that doesn't mean that you should fully ignore the small customers.

I have seen bugs been moved around from Firepower team / unit to ASA team / unit and vice versa.

Since firepower uses ASA LINA , sometimes it is really hard to figure out which team should pick the bug. And sometimes engineers use this loop to keep delaying work.

The most bullshit reason that i have ever heard from BU team is : " TS file does not contain the required logs from that timeframe". getting an engineer from BU / engineering on a live TAC troubleshoot session is a huge pain. So many formalities , so many internal mails , its a mess.

The Entire Firepower product

The integration between Firepower / ASA / FXOS is really really bad.

Imagine you are building a car, you start like Toyota. ( ASA ) . the car is really good , it has gotten really good over the years.

Now a new car manufacturer comes in market ( BMW ) , it has really good interiors and up to date electronics.

If you take the engine from a Toyota and put it in a BWM body , would the car be good ?

BMW and Toyota are made for different use cases and different market segment. same goes for Firepower.

the product lineup and integration is so shitty. Firepower modules go on ASA , it can be a standalone device like FTD ( which is basically a combination of ASA + Firepower ) . On top of that , there is Firepower chassis , Firepower management center and what not. Compatibility issues everywhere.

Why cant the product line be stream lined as PALO ALTO firewall. For anyone reading this , please compare PALO ALTO with firepower before buying. Remember that their entire TAC team is in US , so you get the best support possible. They do not outsource their work to employees working in Noida , India

you can refer to firepower rant II post on reddit for more on this

Forget about production environment , FPR devices don't work properly in my lab environment.

Now coming to Fake CCIE certification

When a TAC engineer tells you that he is CCIE , you should not believe him outright. Cisco has internal program where they reward engineers for completing CCIE . Last time i checked it was about 1.5 times a month salary for all blue badge employees.

Candidates know the questions ( 'ccie dumps' as we call them in our lingo ) before hand ( both theory and lab ). They reach out to third party vendors and set up CCIE labs. The labs are exactly same ( topology wise ) to the CCIE labs. Candidates practice the same question over and over again on the devices. It becomes a part of their muscle memory. I have seen people writing down the CLI commands and learning the sequence of commands , not knowing what the command does. even the IP address / subnets are set as same.
Even I know all the questions that were being asked in lab exams last years( pre - covid time )

Now I don't know how do these third party vendors know the CCIE question beforehand, but something really fishy is going on here. It is simply impossible to study for 6 months and pass the certification.

Also , one major factor why people opt for CCIE examination is that they get off-queue in TAC for few days. ( it is basically few days of holidays ) . people have booked CCIE exams with no intention of passing the exam so that they can get some paid holidays.

So it is very possible that you might someone who is CCIE in security but while working on case he has no idea how / what is ARP.

Next time when you see a shiny linked post about ccie, or when an engineer mentions is CCIE ID in e-mail signature do not get fooled. He/she is just good at mugging up and donkey work , not actual troubleshooting or networking.

I did my CCIE last year. Honestly speaking , I did it for the money and off-queue. But I never mentioned to any customer ( nor added any CCIE number in my e-mail signature ). Deep down I know I just cheated on the exam. I have met some of the good experienced engineers who do not even have a ccna certification , but are far better than me.

Now the pressure that we have to deal with

Everyone is after us , eating our head at the same time. I have been on webex calls with people who have no idea what is going on the case. HTOM / account manager / duty managers / sales guy have interfered in the ongoing case so many times. well , if you want a case update why don't you go over the case notes. I just get fed up when some random person who has not even gone through the case notes pings me and asks Problem description and action plan.

Working in TAC is really stressful.

I once saw one of my fellow engineer crying on the call. Another engineer came over and took the call from there , but things like this really demotivate us on the floor.

TAC engineers work on weekends. they work on all holidays. Some of us are really hard working. we have to work in odd timings. Some teams go to work at 10 PM at night ( local time) , some teams go to work at 6 AM in the morning ( local time )

This all takes a toll on our works.

We are over worked. we have sometimes take 4-5 cases per day. Backlogs with over 40 -50 cases were pretty common in the pre - covid times. even right now , VPN + webex teams are getting a lot cases because of everyone working from home.

Now coming to Managers

Some managers are good , some are bad. Few managers really care about the team. Few are just sitting and only thinking about SLA misses and NPS scores. In all the team meetings i have to go through the same bull shit. same questions over and over again. Managers should understand that metrics ( such as NPS) are not everything.

I have gotten 0 NPS scores many times. I have also gotten 10 NPS scores a lot well. But the NPS scores are tied to the engineer's record when they close the case. Even if 10 different engineers worked on one case , the NPS scores get assigned to the person who closed the case.

My manager once sent a wrong e-mail to the customer. He was supposed to follow up for a different case with a different bug but ended up sending e-mail on the case i was working on. Another week of confusion , and back and worth webex / e-mail / phone call. If that happens one time , i can understand it , but such things have happened so many times .

Parity between blue badge and red badge engineers

When the cases come in queue it is assigned to an available engineer. Since red badge employees are on contract and are paid less than the blue badge employees they always have this attitude of ''' i am being paid less for the same work '''. That reflects in their work as well.

there was a recent incident where some red badge employees went on a call pretending to be either a tech lead or manager. they were fired . i am not sure what exactly happened , I just heard it in internal communication. it was a different team ( probably VPN ) and different shift
This is what cost cutting does. It degrades the work quality.

I have seen a lot of firepower TAC cases , and one thing I can say is nothing has improved in Firepower line of devices for the past few years. Yes , the engineering team keeps rolling out changes but core functionality is still very bad.

If you raise a case, and it is picked up by someone who is a really good TAC engineer , you will get the best support from cisco. you will get proper updates , proper hand-off / FTS and proper resolution of your issue.

But there are some really poor TAC engineers as well. They do not wanna do any Lab repro ( because lab repro is time consuming and tedious ), they don't know about the product ,and neither are they wiling to learn. Some engineers are really lazy to even add internal case notes and do proper hand off / requeue.

I have received requeued / hand off cases which makes no sense . Wrong bugs attached to wrong cases , missing logs , missing action plan , case notes pasted in another irrelevent case , I have seen them all. I have seen cases that are going on for 6 months or even more without any relevant internal case notes. Sometimes i have to start my troubleshooting again from beginning ( because the case notes give me no clue ) . I don't mind troubleshooting from scratch , but it gets annoying after some time. ( for customer and TAC engineer as well ) .



No comments:

Post a Comment