Monday, December 18, 2017

Network Performance Monitors

This is a X-Post of mine from r/sysadmin. I am looking for all the help I can get.

I have wandered into an interesting problem that could use some insight from more experienced hands.

Background: I am a network engineer for a Fortune 500 Company. We have a very large presence over the entirety of the lower 48. We are very short staffed.

Problem: We are currently trying to monitor uptime of our devices across the US via Solarwinds NPM. We are monitoring a large swath of devices that seems to be beyond what solarwinds can handle. We are monitoring 108,187 unique devices.

SolarWinds Physical: Seven servers. The primary has 8 threads of an Intel Xeon E5-2670 v3 and 32gb of ram; this server handles the primary poller, the web console, and config backups for a small portion of devices (about 1000). The six secondary pollers are running half the threads and ram. All seven are VM's. We are on SolarWinds NPM 12.1 and NCM 7.6. They are being upgraded to 12.2 and 7.7 respectively today.

Problems with this setup: There is a two fold issue with our setup. First, the pollers are only designed to handle around 11,000 devices. Second, the collector service that aggregates the data from the pollers is only 32-bit.

Observations from the first problem: that vast majority of the data we collect with SolarWinds is ICMP and round trip times. As such, the pollers themselves are not stressed. However, they do poll quite a large number of devices, between 15,000-19,000 per.

Observations on the second problem: When the primary server goes to aggregate the data to push it to our SQL Database, the process that does so (which is 32-bit) runs itself up to 4gb of ram usage, hangs, then crashes.

I have been back and forth with SolarWinds Support on this for over a month now and we are currently looking for options.

Has anyone else experienced this issue and can provide insight? Alternatively, does anyone have any suggestions on programs that would be able to handle 108K devices on ICMP and Round Trip time?



No comments:

Post a Comment