Long post, so I'll start with the TL;DR: This week I learned Fortiswitches come with spanning tree disabled out of the box. Fun times where had.
I'm not exactly sure what happened in my brain this week, but everything that could go wrong, did go wrong, and everything was because of stressful former weeks, too much work and bad planning.
I had earlier (a couple of months ago) designed up a new proposal for an upgrade for a customer. My boss was a bit eager to try out a couple of new solutions for this customer and not just go for a standard Cisco setup on the layer 2 segment. As my company and our customers widely use Fortigates as their primary firewalls, we started checking out other Hardware products besides FortiWLC, and landed on Fortiswitches. Fortinets Security Fabric structure is pretty cool, and we went for a couple of 248's. At first we ordered up a FW-cluster and a 248 to test at the customer site. This was working quite well, and the customer ordered up two additional 248's to replace their stone-age 2950, some old HP-switches and an Allied Telesis stack.
The week before, all my planning went to shit, as I was side-tracked by more pressing issues. I wrote up a quick plan, but forgot essentials as creating a change, getting this change accepted in CAB and preparing some more details around a roll-back plan and overall plan for the migration to new equipment, as well as much needed essential research on Fortiswitches.
Migration day came, and I prepared to possible configurations; one where I set aside dedicated Virtual Switch Link interfaces for the Fortiswitches directly to the Fortigate, and one where we daisy-chained the fortiswitches. The first topology worked fine for all the equipment, but you can't view the switches on the Fortigates, which was one my main goals to make configurations easier. I rolled back and went for a daisy-chained topology. All Switches showed up in the Fortigate management view, and I started configuring up all necessary trunk-ports and access-ports. The last two cables remained, and the job would be done within the agreed upon billable time.
The last two cables however, where attached to another old HP-switch, which was a dedicated AP and WLC switch. This switch hadn't had any LAG config, but two cables where connected to the Allied Telesis stack nonetheless, One port was of course in blocking-state and the other one was forwarding. As I was interested in getting the job done, I didn't give this much thought, and hooked the cables up to the Fortiswitches.
Big mistake.
Apparently STP isn't enabled on Fortiswitches out of the box, however, everything seemed to be working fine and I got the good ol' pat on the back and thanks.One and a half hour later, our monitoring guys are calling frantically, the WHOLE site is done, nothing is working as it should, but from our device-database, we're able to reach the servers at the site, but devices on the same VLAN's aren't able to reach each other. Luckily, but alas, to no help at resolving the issue - I was able to reach the Fortigates over the Loopback interface, and lo and behold; topology and duplicate OSPF router ID's everywhere. I quickly disabled all interfaces I knew where connected to other bridges, and set the Fortiswitch directly connected to the Fortigate as Root bridge, and the topology changes stopped, however, this didn't help at all. And devices on same VLANs or had policies in place, where unable to reach each other.
After 16 hours of troubleshooting, and completely messing up the whole environment, and using my poor colleagues time on troubleshooting with me; we rolled back everything to the old equipment, and everything, except for the poor vcenter, came up and worked again.
This sucked so bad, as well as I'm stuck with the worst conscience for my colleagues who also had to help get everything up. I'm now in a re-planning phase, and setting up an identical lab to the proposed solution for the customer, to really cover all aspects and doing everything from scratch. A hard lesson was learned this week.