Monday, August 27, 2018

Just had my first zero-touch zero-downtime automated ASA HA pair upgrade!

I am of course aware that Kirk Byers has a tutorial for a zero-touch ASA upgrade, however we run exclusively HA pairs and I wanted a production ready way to upgrade my remote pairs with zero touch and zero downtime. Additionally, my scripts use NetBox as a backend data source and notify me via Microsoft Teams when taking action. I've been playing with my lab 5515-X pair for the last several days and today scheduled my Corporate pair (the hub for my 30+ sites) and a remote site simply because they run AnyConnect and needed to be upgraded for a vulnerability. Where before I would have scheduled one site per night I'll now be able to schedule all remaining sites in one night and just watch them work. Moving forward this allows me to be more consistent and timely with upgrading.

Future improvments will have a web interface from which you can schedule uploads/upgrades and view the current status of jobs, but that's once I get a better understanding of Flask and can figure out non-blocking asynchronous celery sticks, or something. At this time they're simple Python scripts that run in Docker containers, one for uploads and one for the upgrade process alone. A simple shell script allows me to run the containers with a single simple command and some parameters (pair hostname+version). In this case I just scheduled a cron job to run each job.

Of course my upload container is separated from the upgrade and intended to be run ahead of time. The upload script does a lot of hoopla but ultimately just uploads via SCP. The upgrade process takes the following steps:

  • Send a message to Teams telling me the process is starting
  • Verify the appropriate file exists on both ASAs for the desired software platform to upgrade to
  • Set boot variables
  • Reload the standby
  • Wait for it to come up to Standby Ready status
  • Failover
  • Reload the new standby
  • Wait for it to come up
  • Fail back over (if the 'primary' is the current standby, otherwise stay where it is)
  • Double check the current running software version on both boxes is the new one
  • Enable the ASA REST API module
  • Configure the ASA REST API module
  • Update NetBox so the pair is documented as having been upgraded
  • Finally, send me a Teams message of completion.

My try/excepts ensure if there's any issues I get a failure Teams message instead of the completion one, and as a last resort my containers all have the hostname appended to their name so I can review the logs if something goes horribly, horribly wrong. Here's my logging snippet.

totallyroot@my-linux-server-but-not-real-name:~# docker logs -f asa_upgrade_CORP-5525

Sending Microsoft Teams message

INFO:app.asa_upgrade:Upgrading CORP-5525

INFO:app.asa_upgrade:File asa982-38-smp-k8.bin found on active CORP-5525

INFO:app.asa_upgrade:File asa982-38-smp-k8.bin found on standby CORP-5525

INFO:app.asa_upgrade:File asa982-38-smp-k8.bin found where required

INFO:app.asa_upgrade:Active platform: 9.4(4)16

INFO:app.asa_upgrade:Standby platform: 9.4(4)16

INFO:app.asa_upgrade:Updating configuration boot variables

INFO:app.asa_upgrade:Reloading original standby

INFO:app.asa_upgrade:Watching standby status

INFO:app.asa_upgrade:Current standby status is: Failed

INFO:app.asa_upgrade:Current standby status is: Negotiation

INFO:app.asa_upgrade:Current standby status is: Sync Config

INFO:app.asa_upgrade:Current standby status is: Bulk Sync

INFO:app.asa_upgrade:Current standby status is: Standby Ready

INFO:app.asa_upgrade:Standby reload successful

INFO:app.asa_upgrade:Failing over

INFO:app.asa_upgrade:Reloading second standby

INFO:app.asa_upgrade:Watching standby status

INFO:app.asa_upgrade:Current standby status is: Failed

INFO:app.asa_upgrade:Current standby status is: Cold Standby

INFO:app.asa_upgrade:Current standby status is: Sync Config

INFO:app.asa_upgrade:Current standby status is: Bulk Sync

INFO:app.asa_upgrade:Current standby status is: Standby Ready

INFO:app.asa_upgrade:Standby reload successful

INFO:app.asa_upgrade:Currently active is not the primary, failing over

INFO:app.asa_upgrade:Current active device is now the Primary

INFO:app.asa_upgrade:Active platform: 9.8(2)38

INFO:app.asa_upgrade:Active device upgraded to ASA 9.8(2)38

INFO:app.asa_upgrade:Standby platform: 9.8(2)38

INFO:app.asa_upgrade:Standby device upgraded to ASA 9.8(2)38

Upgrade completed for CORP-5525

INFO:app.asa_upgrade:Enabling REST API image

INFO:app.asa_upgrade:Enabling REST API configuration

API enabled

Updating NetBox

Updating NetBox complete.

Sending Microsoft Teams message

Next one in 10 minutes!



No comments:

Post a Comment