We have a SaaS application collocated at a couple datacentres in different areas of the world for prod and DR.
Currently we have a spaghetti-mess of different switches (2960s, 4509s, 3750s, etc.) some are stacked, some are stand-alone, all doing spanning tree as they are looped.
Hanging off of this we have a couple hands full of physical servers, a couple UCSs, a couple Blad Systems, a 3-4 SANs, and a couple sets of LBs, all of our ILO/iDRAC, and serial KVMs, etc.
Most physical compute resources are running VMWare or Windows, and Most Virtual compute is hosting Windows guests.
The network is used only for the SaaS application itself and management of the SaaS application, although the QA and some semi-production instances are housed in the same network segments.
Currently, all of this networks have a core switch which they plug into, which has non-redundant connections for several sets of ASA firewalls, one for management traffic and VPNs, another for client VPNs, and the other used entirely by the application traffic
These firewalls connect to all networks using a single interface and have a max throughput of 1gbit/s and 50k connections or so.
None of the plans are currently packet switches or layer 3 routed within the existing switches.
Instead, all of the traffic moves through them, up to the firewalls, which do the routing and ACLs between hosts on the networks.
There is the concern due to long failover times causing interruption of the application when some of the existing switches have gone down.
We have also repeatedly run into issues where connections to the admin firewalls have been used up while running vulnerability scans
We have purchased new Nexus 9000 switches to act as a single pair of root devices inside of the firewalls.
We are in agreement to try to remove as much of the old 1g switching infrastructure as possible (although blade systems will need to stay, and we may need to keep at least 1 pair of old switches to plug ILO/idrac/LBs into.
My proposed design is to migrate to a collapsed network that is a true spine - leaf setup.
My goal in this is to completely eliminate spanning tree and reduce the need to have any connection dropping due to a device or port failing.
Additionally, I would like to move all East/West traffic within these networks in the data center down to the new Nexus 9k switches
This includes implementing ACLs between hosts in as similar a fashion as possible to the existing ASA Firewalls.
I believe this should allow us to reduce chances to drop application traffic due to firewall failover or device/port outage to practical nil.
I have gone over this with my manager, and after spending hours meeting with him he was finally in agreement, however the next day he sent me an email saying he wasn't convinced and asking to see more on how things would work and pushing again just to implement the switches as dumb switches in the core running spanning tree, doing no layer 3 or packet switching.
He then set up a meeting with his manager where he undermined my work ethic and said the project was going nowhere.
His manager suggested writinng the whole thing up and going over it, again as a group, which I am fine with on the face of it..
However I have found out he is working to collect as much oppinion to contradict my reasoning as possible saying that ACLs on the switches are too cumbersome to manage and will be too confusing, and add more places to look to troubleshoot which is too much ewffoer, and only he and I and maybe one other person will be able to look in on the ACLs regardless, so he doesn't want them to be too hard.
In speaking with him over the phone, he attacked thw whole thing as unmanagement nd needing to take too long to implement, and said that others he knows are saying that managing the switches with ACLs is too much trouble. (subjective)
He's also afraid that we'll "have to check two places for any issue (Firewalland switch) when to me it just means east/west = check switch north/souht check firewalls.
Hower I admit he might be right I mean I imagined we could use a couple deny internal network range" and then "allow any IP" after that to allow for the wirewalls, what might be a better mnethod there?
He also complainged that the firewalls will basically be doing "only nat" but they will still handle the ACLs for in/out boudn traffic, including VPNs,s o to me thats a fair bit.
He did suggest maybe making the change only for the management network, and after some time I am leaning towards accepting that chanage, although that will in my mind cause the exact issue even more so of havingf to check tweo places to see where the traffic goes wrong.
Is there a solid rebuttle of this? Am I over thinking things? Is is really worth doinng what I'm doinng?
I came here hoping to get somme thoughts on how to rebut these Items but I also admit, maybe it is worth re-thinking the ACL migration to the switch, what do you think?
How can I manage my boss over-all on this if I'm goingn innt he right path?
Is there any caveat I or my manager didnt think of that should be considered?
thanks to all for your help and support on this! :)
No comments:
Post a Comment