Friday, July 3, 2020

Chasing Ghosts in Monitoring

We have a few (thankfully) edge devices that will occasionally go down according to our monitoring servers (some times only some of them). Thing is, by the time you can react to the alerts, even if you were paying sharp attention, the problem is already solved, with no clear evidence along the path as to why we lost monitoring for a minute or any indication that it was actually down in the first place. On top of all of this you have an exec CC'd on these emails wanting explanations for everything with nothing to contribute to the hunt.

Have you fine people ever had stuff like this happen in your networks? What was the cause?



No comments:

Post a Comment