With the perspective of a system engineer but with a pretty high interest in networking and routing I tried to learn more on what aspects should be considered worthy to monitor and what metrics to be collected. That is: From a more generic point of view, unrelated to the monitoring system or switch/router vendor.
After some reading (including this sub's wiki) I've tried to collect some aspect I'd try to monitor and collect metrics but what would you add and what would you consider critical to be monitored? * Hardware sensors like temperature, fan status (depending on what may be exposed i.e. via SNMP) * Link status of relevant links (ifoperStatus, link speed) * SFP status (if exposed) * Availability of the management interfaces (SSH, HTTPS) * Resource monitoring (CPU load, memory usage) i.e. via SNMP get * PPS alerts (i.e. on import - i.e. uplinks - mostly SNMP based) * Graphing of interface usage * SNMP trap evaluation? * Remote syslog and feed into Logstash or alike, filter for patterns (auditing?)
If there is a pointer towards a book to consider or an article, then I'd appreciate that as well
No comments:
Post a Comment