munin is a server status monitor. The graphs show statistics for various devices/daemons/... for the raspberry. For instance, if you look at the bottom of the image you'll see how the system time has compared to network time (through ntp) since the start of the server.
> keegan@lyle$ cat oomnomnom.c
Made me LOL.
/proc is a treasure trove of useful and fascinating data about your system. If you want to see some examples of real-world uses for this data, install Munin. Many of its monitoring plugins use /proc.
Depends on what you like and what your demands are.
All "graphical network monitoring systems" will do SNMP. Not all will do SNMPv3 (most will do v1 and v2c).
Some can also use netflow and/or sflow to absorb statistics.
PRTG is great if you use a windows server (or workstation) to perform the stats, however it costs (there is a crippled max 10 sensors freeware edition however) - about $440 for 100 sensors incl 12 months of updates (other prices exists such as $9500 for 5000 sensors or $13500 for unlimited number of sensors).
Other options (specially if you are not too happy to pay or doesnt need to run on a windows box) is Munin:
Then of course you have various full blown solutions which might or might not fit your needs (in your case perhaps you could run them virtualized if you cant dedicate a full server for this):
http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems
Tricky part with many of the full solutions is that they take some effort to get going and to maintain over time (compared with lets say Munin which is more minimalistic and in my opinion my personal favorite because you dont need any database (such as postgre or mysql) backend to maintain etc).
There are lots of ways to do this. If the output of "top" provides the information you want, you could use something like "top -b -n1 >> /var/log/topdump.log". Put that in cron as set it to run every X minutes.
If you want a graphical representation of memory usage for the whole system (not per process), munin could provide that.
It's relatively trivial, actually...
http://munin-monitoring.org/wiki/LinuxInstallation
http://munin-monitoring.org/wiki/munin.conf
Just make sure that the htmldir in munin.conf is writable by the munin user and readable/shared by your webserver user (apache?).
What were the problems you were having with SNMP?
I'm new to Synology and have Prometheus deployed elsewhere already, so going the node-exporter route and planning to use SNMP as well - though SNMP exporter looks like it has a learning curve to it.
Any good monitoring system will involve some amount of work to setup. If you want something really basic, I've used Munin awhile ago with success. I have no idea if it'd work on a Synology but it's quite simple so imagine it wouldn't be too hard.
I'd look into Apport - this tool is famous for getting on your nerve on Ubuntu but it does basically all you want for you and it should work fine on servers. However what do you mean by crashing? Apport cannot detect (I think) something like swap of death or daemons going into zombie state...
Maybe it's worth modifying these tools... or just using this
/proc/sys/kernel/core_pattern
$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c %P
So just write something for yourself... the variables are args to your program see here: http://man7.org/linux/man-pages/man5/core.5.html
And the coredump is piped to the program on stdin
Then in your tool you can safe the coredump somehwere and just call ps auxf or top or whatever you need!
Another alternative could be auditd and the audit subsystem.. it's fucking complicated but allows logging down to the system call level... with a just a few weeks of your time I'm sure you'll manage to pull out a visualization out of it...
Personally I'm a friend of graphing tools like munin oder so... there is a plugin for munin that monitorings CPU and memory for users or processes and this gives you a nice view of historic data - e.g. http://munin-monitoring.org/browser/munin-contrib/plugins/processes/multicpu
Nagios and Munin are what I use.
Munin does polling every 5 minutes, so it logs general trends with little interfereance. http://munin-monitoring.org/wiki/HowToMonitorWindows
Nagios is just a great product, you can go opensource or commercial.
Those two are really the gold standard for Opensource monitoring software
I'm not sure how DIDWW work - Is each DID a SIP account which registers as a SIP peer from Asterisk -> DIDWW's Call routing servers?
If each number does register to the server from the carrier over its own account you could use munin (http://munin-monitoring.org/) for monitoring - Theres plenty of plugins out there for Asterisk which will monitor SIP Peers every 5 minutes and report back how many are online. You'd just need to modify the script so that it only reports back on the SIP accounts which are DID's. Eg: http://exchange.munin-monitoring.org/plugins/asterisk_sippeers/details
This is what we load onto all of our customers Asterisk PBX's systems - Works perfect for us. You can even setup email alerts.
Or you could just run some basic command line stuff to check the log files for outages:
Check the SIP accounts and manually look for offline accounts: asterisk -rx "sip show peers" | less
Or IAX2: asterisk -rx "iax2 show peers" | less
Or look through the logs for peers going offline: egrep -r "Reachable|UNREACHABLE" /var/log/asterisk/messages*
Still be less tedious and alot less time consuming than continuously ringing the numbers.
I think it's awesome that we've managed to come to a place of agreement on this! Thanks for your reply!
As for norse gods, I'm an Atheist, but culturally I keep saying "god" as an exclamation and having to delete it. Otherwise I'd be more inclined towards Munin, but Munin isn't even a god, and it's more about the software (http://munin-monitoring.org/) anyway.
Oh fuck, i hate name collisions.
http://munin-monitoring.org/ is very well established and is still one of the easiest ways to get very detailed monitoring of your network & hosts in just a few minutes to set up, with dead simple flow for making your own plugins.
Munin (http://munin-monitoring.org) is handy for me. Free, and out of the box will track all sorts of useful system metrics. And adding new 'sensors' is easy, you just have to learn the syntax for how to report the data to the Munin engine, and then you can write your own plugins. There's plenty of good documentation.
Munin is written in PERL, but don't let that scare you away, because plugins you write can be in any language you prefer.
The main munin engine can run on most anything that can run PERL, and it doesn't use a lot of resources.
There is some stuff like http://munin-monitoring.org/ which you could install locally - but I'm always reluctant to install new software on a server for this kind of reasons - and it requires some Internet connectivity to be installed.
If you're really in trouble and you have some cash, it could make sense to get some external help - make sure you select someone who knows the OS/network parts as well as the application stack (Mongo, Redis, whatever language is used for the application).
IPerf would be my choice but make sure that the boxes you run it from have good nic's and good tcpstack settings along with updated os.
Preferly I would setup a livecd to do this so the box just have a good enough cpu and lets say intel nic to be able to properly push the traffic needed (using realtek nic's for this purpose is often a bad choice for example).
Once you have your probes setup you can use them to also monitor link quality such as latency and what else through scripts and snmp, like with help of http://munin-monitoring.org/
Not sure how much detail you are looking for, but http://munin-monitoring.org has built in mysql monitoring plugins, plus it's easy to write your own plugins in the language of your choice.
Zabbix and nagios may also fit the bill.
If you're looking for per-page related statistics and monitoring that may be another beast.
I think you should take a look at New Relic (http://newrelic.com/) and Munin (http://munin-monitoring.org/) server monitors (or any similar server monitors, these two came on top of my head). They would give you an insight to which area is the most critical and provide you with information for your server resource usage. If I understood correctly it should at least point you in the right direction so that you know where you can add a bit more parallelism to improve performance.
did you take a look at munin ? i found it was easily setup and configured. I basically followed the documentation but as far as i remember it was as easy as 'apt get install munin-master' (on the server that collects data from the nodes) and 'apt-get install munin-node'.. or something along those lines .. you can setup e-mail alerts, configure overview pages to get an overall glimpse..etc
People have already done the whole "Security" part.
For performance, you might want to checkout Munin or monitorix for monitoring/performace.
I prefer munin for the fact that it can give you an overview on all boxes for monitoring. Just need to whitelist that IP:PORT. 1 Master, Many Clients. (And can work on all systems)
But if your talking about your systems as in local/single boxes monitorix will do the same.
Munin http://munin-monitoring.org/
and
Nagios https://www.nagios.org/
and will be exploring the usefulness of the ELK stack, but I have access to a Splunk License. All are good and useful, and can be tailored to different needs and audiences.
E.G. I never show my management nagios, but I will very happily share with them selected Munin graphs or ELK outputs, as pretty pictures make for way less explanation. Do be careful not to make your customers data nerds with anything you share, they will start nagging you for too much information.
Or using http://munin-monitoring.org/ which is more lightwave and can monitor anything you wish.
Also VIP as in Very Important Person or Virtual IP (for loadbalancing)?
Because if its the later then the loadbalancer should be able to keep track on the servers.
Otherwise to remotely monitor remote sites where you actually have no access to its wise to have a locally installed monitor such as munin or such so even if the WAN link breaks the servers themselfs (and whatelse) is still being monitored.
Does it have to be a single commercial product?
Im thinking because many of your requirements can be dealt with by individual opensource products.
For example using load map for graphing and reporting of topologies:
http://stats.sunet.se/stats/map-doc.html
http://stats.sunet.se/stat-q/load-map/optosunet-core,,traffic,peak
Using munin for individual graphs (of interfaces, latencies and whatever):
Using vsftpd + git or even subversion for storing the actual configs.
This way you can use archive function in your equipment so it will upload on its own and/or create scripts that login to your equipment and dumpt the configs every night or something.
With git or subversion you can have easy guis to do diffs between current and some older revision of the config from your workstations.
The above is very easy to scale and even loadbalance over multiple servers if needed (depending on how many devices and interfaces you need/want to monitor and maintain).
Sure that the above will need some config to make it work, but so will any other commercial product aswell.
I dont know much about monitoring, but I would give this a try. http://munin-monitoring.org/wiki/HowToMonitorWindows It is the same monitoring tool that is integrated with bamt, accesible through a browser though you may have to install WAMP or some sort of server software first.
I find it easiest to just install teamviewer, this allows you to remotely access your computer from any computer or smartphone, not only can you view the screen.. but you can also change settings remotely. It has saved my ass at times when my system crashed and I was miles away.
At a previous employer we used a combination of Munin (http://munin-monitoring.org/) for resource utilization monitoring, and Nagios (http://www.nagios.org/) for automated "System Down!" alerts. The setup seemed to work pretty well.
Id go with munin http://munin-monitoring.org/ over cacti personally. Much simpler setup and configuration. Cacti is fine if all you're going to graph is mostly SNMP data i guess, but i doubt thats the case.