Saturday, December 5, 2020

Alternatives to static alert thresholds?

I work as a network engineer for a regional ISP. We use Solarwinds NPM for our network monitoring system. One disappointing limitation with this system is the inability to build alerts on anything other than a static threshold. For example:

If an interface with a description containing "Backbone" is at 80% utilization, send alert.

For obvious reasons, this sort of static alerting threshold doesn't scale well. I would love to implement an alerting system that uses something like a standard deviation from an expected utilization level, or some other method for anomaly detection.

I'm curious - what have been other peoples' experiences with using static thresholds? How have you grown beyond using them? What tools do you use for this purpose?



No comments:

Post a Comment