Wednesday, January 22, 2020

Peer Review Tool Idea - Interest?

Syn/Ack and all that... happy 3rd Monday everyone.

So throughout my career I've seen a few times where small changes while attempting to implement something simple like a new VLAN/SVI/VRF/Trunk/Etc lead to an outage because some small part of the change was done incorrectly. Additionally, I worked at Google for a few years and had good exposure to the changelist/peer review process for software (SWE's/SRE's). I haven't seen a good product that can kind of find a safe middle ground to get a small team closer to the way the big 4 tech companies treat their infrastructure prior to a full feet-first committment to Infra as code involving either new hardware/OS's or maybe something like Salt/Ansible.

So my idea that I'm looking to see if there's any community/industry interest in is as follows: I plan to develop something in Python with a backend that provides a lightweight map or listing of network devices in your environment. When you want to implement a new change, you grab the devices that will be necessary to make changes to, and the software pulls a current configuration from those devices and starts a changelist. At this point, you make proposed changes to the configs of all devices, and submit the change for a peer review... which will roll up a Diff comparison of old/proposed and allow another network engineer to quickly review for sanity check. Once approved, the changes could be made (manual or automation driven potentially). I know if all network configs were in Git or some version control, this would be easily doable... but for some traditional environments there's no VC integration at all, which is where I think this software could help potentially.

There's plenty of other ideas in my head about bringing in unit/integration testing, etc... but at a very basic level that's the idea to take a traditional infrastructure and provide a tool to slightly bridge it towards a more infra as code managed environment and potentially cut down on small user error driven outages.

Is this something anyone would want to try out or could see themselves using? Is it something that's a waste of effort due to XYZ? Thanks for any feedback.



No comments:

Post a Comment