Saturday, November 24, 2018

[UPDATE] I shut down my company over thanksgiving to do a network migration and somehow it all worked.

I'm baffled as to the how and why, but it's all back up.

Interestingly, when I got to the office at ~8am on Thanksgiving day my first couple hours were spent dealing with a problem of actual tubes instead of metaphorical tubes. Our fizzy water machine had decided to start leaking sometime in the middle of the night. If I hadn't been doing this work today, no one would have see this until Monday and the damage would have been MUCH worse.

So after shutting off the water, making the required phone calls and letting in the cleanup crews, I was able to get to work. Wearing my lucky shirt I took one last snapshot of all configs and a deep breath, and at 10:43am shut down all the interfaces on the edge firewalls.

By 11:30 I'd cleaned out enough of the old mess to be able to start building new security policies. We previously had ~180 rules, and almost all of them needed to be adjusted in some way and none of it was common enough to script. Once I was done with those I moved onto the NAT policies, then the Policy based forwarding, and by 1pm I was starting to re-patch everything.

The patching took a solid 4 hours, but it was some of the most fun I've had in my career. I knew what needed to be done and was so excited to finally be getting to clean up this mess after months of planning that I was jogging from IDF to IDF because walking was just too damn slow. I had my charts and diagrams printed out and I'd pre-staged most of the new cables beforehand, so much of the time was removing the old pile of spaghetti and installing new cable management.

At 5:15pm, I got OSPF neighbor relationships forming between the new core switch and the edge firewalls. At 7:15pm all of the IDFs were back online, and at 7:25 the DHCP relays were pointed back to the servers and I was greeted by the "Bloo-loo-loo-looop!" noises from around the office as all of our VoIP phones started regaining connectivity. At 9pm, after verifying that all of the VLANs had internet connectivity and I could get to our network drives and AWS VPC, I sent a status e-mail and went home.

Yesterday (friday) I arrived at 9am to let in the next round of clean-up crews for the water damage, and got back to testing and documenting the changes. There's been some minor glitches that I've taken care of, but almost all of yesterday and today has simply been cleaning up the old switch configs, clearing out now-unused VLANs and labeling things both in the configs as well as with physical labels on the devices themselves (And almost complete depleted our stock of label maker tape). This cleaning up has taken a surprisingly long time in the Palo Altos because VLANs and interfaces are referenced in SO MANY DIFFERENT PLACES that you have to hunt through the entire system clearing everything out before you can actually delete them.

I only ran into one real roadblock that required a call to support. It turns out that when you're putting an Ether-channel pair through a Palo Alto firewall in Virtual Wire mode, you have to create two separate virtual wires instead of aggregating the ether-channel onto the firewall itself (as described in this article that the nice support lady linked me to: https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClHTCA0). Once I got that straightened out, it was mostly clear sailing and everything went according to plan. Which still kinda freaks me out . . .

Now I'm standing here looking at everything and it's all working. Quite, happy little packets just humming along, getting where there going in a much more logical way than before. But there's a very real part of me that is utterly perplexed that I was able to implement this whole thing and that it actually worked.

If there's a single most valuable lesson that I can take away from this, it's to ALWAYS MAKE A CHECKLIST. There were so many times that I was getting overly excited and flustered that I absolutely would have missed something major if I had not made a very long and detailed list beforehand when I was not quite so fizzy-brained. Every time I started to get ahead of myself, I could turn back to the list and just focus on the next step.

And now I'm going to spend the rest of the day relaxing by writing documentation, and then head home this evening to do some MORE obsessive planning and engineering. Because a transfer window to Jool is opening soon and I've got contracts to land my little green men on Laythe as well as do a rescue from Vall, and when you're playing with an unforgiving life support mod you need to plan your missions out in great detail if you're going to have a hope of getting home. ;)



No comments:

Post a Comment