We have been using checkmk for the last 5 years, it's been working great
They have a free version (raw), and 2 paid versions (enterprise and managed services)
There are of course some differences between the versions, but I would say the free version works fine.
We monitor ~3500 servers and network devices.
Web: https://checkmk.com/
If you need more than just the basics, something easy to setup and very full featured, there's https://checkmk.com
It's probably way more than what you need if just doing host ping checking.
I monitor 163 hosts with ours today, that's a total of 8997 services being actively monitored.
Maybe look at https://xymon.com ?
This works to an extent in 1.6, but my impression is they are doing things in a much more k8s-aligned way in 2.0
Their upcoming remote conference next month has a 45 minute presentation on this for free (registration required):
We either use Check_MK's mk-job, or have our Jenkins run the job.
With check_mk, we have the Job Status, Runtime and more in our Monitoring as service, and can notify based on that.
In Jenkins we can get full logs and get Email notifications for failures.
Nobody mentioned Checkmk. My go-to monitoring tool. Really easy to deploy and works with agent or snmp. Dead simple to add your own checks too.
https://checkmk.com/cms_distributed_monitoring.html
A monitoring server for each location and one master server. You don't interact directly with the monitoring servers. The master server collects and displays statuses, and you make configuration changes on the master server which then pushes it out to the monitoring servers.
Is the Sonicwall configured to do inter-VLAN routing? If that's the case you could configure a workstation to be on the same subnet as the local resource. If you don't experience connection drops it could be the Sonicwall that's causing the issue.
I'd also recommend setting up some kind of monitoring server like Checkmk to do continuous monitoring of your network.
The best all-around check I have found is check_systemd
It runs `systemctl --failed` and a few other checks to alert on things such as access to an NFS export failing. The other type of alert I have found helpful in a server deployment is one that checks coredumpctl.
I'm using Icinga2 with this type of plugin, but had used Nagios Community Edition a decade or so ago. The modern ops-friendly version implementation appears to be CheckMK.
Falls ihr checkmk als Monitor toll habt dort gibt's aus der community schon ein Local check den man mit der agent bakery auf seine server verteilen kann https://checkmk.com/blog/automatically-detecting-log4j-vulnerabilities-in-your-it
What brand firewall? If it is supported here https://checkmk.com/integrations then it is probably set up for you. Add the host and in the settings add the snmp credentials. Configure the snmp in the firewall and then "save and go to service configuration" in checkmk you should see the various checks. Depending on the firewall it may be there. If not you could do a custom check via the MIB
I haven't used CheckMK, but I use Icinga2 with a Thruk front-end with check_by_ssh performing outbound checks from the Icinga collectors to the monitored systems. [ Note, it looks like livestatus support is deprecated by Icinga ]
One option may be to https://checkmk.com/integrations/check_by_ssh in case you are allowed to make outbound port 22 connections to the monitored systems.
Thruk supports multple monitoring system types through MK livestatus. The way I am using Icinga, an Icinga server can be put into each network segment with a central Thruk management server pulling them all into a single pane of glass. That way, only one port needs to be open from Thruk to Icinga.
Check out CheckMK. Like Nagios, but way better IMHO. Also free and selfhosted. "someone" (me) also wrote farming checks so you can monitor your farm status aswell...
You maybe have have to make a little adjustment to the smart-plugin to recognize your usb-drives, but in the end it's doable.
[Imgur](https://imgur.com/bO078q1)
checkmk and WUG are both fairly simple and allow grouping if you need it such as with hundreds or thousands of devices.
https://checkmk.com/ You'd want a Linux server. if there are alot if similar devices the way configuration is done on this with tags allows easily configuration of similar devices. Can use folders to also group/organize devices. Can connect to ldap. The guides are for version 1 currently, so if you use version 2 it won't match.
WUG Runs on windows with Website UI for most things and a desktop app for some things that arnt in the web ui. Can use groups to organize devices. Since its in windows it ties right to AD for authentication.
I think WUG easier to get started on than checkmk, but you'll get used to either. I think if you prefer googling for solutions I feel like WUG community is smaller.
There are simpler free options too like statping or vigil if you only want up/down and not snmp monitoring.
https://checkmk.com, of just google it, thousands of users use it. The ‘raw’ version is free. Very easy to setup, although more complex things are also possible. I give trainings for a small price as well, if you want some more advanced use, write your own checks, etc.
Check_MK is also open source (raw edition). Enterprise edition will just add neater graphs and some agent baking improvements. Their soon to be released 2.0 will add these pretty graphs to the raw edition as well.
All of the three will do the job, but we evaluated them and went with Check_MK due to the feature set. Very easy to configure once you get the hang of it, supports almost everything out of the box.
Usually you define servers (via your CM) to chronyd.
Then chrony will make sure your hosts are always in sync.
Those "servers" could be external Internet pool servers, or you may choose to house local servers that are fed from such sources that sit high up in your infrastructure. For example, many configure their local time servers at the network layer, assuming pretty much nothing happens if the network is down. So, client host are all configured to use a set of local time servers coming off the network sources (which in turn are fed by the Internet pool). If Internet dies, your local network sources are still handing out time.
With regards to Windows clients, usually ok to let them get their time off the domain servers (which in turn would get their time from your network local time sources). I allow this just because... Windows...
For monitoring of all of our equipment we use Checkmk (https://checkmk.com). It will monitor your CentOS 7/8 chrony (and even ntp on the older, but recommend you stick with chrony) and will notify you when there is clock skew.
Checmk is very configurable. So you can send alerts in a myriad of different ways. We feed all alerts to our "firehose" which is a MS Teams channel. But additionally, targeted events are send to contact groups via email as well. And even that can be configured by each individual user as to what they want or don't want to see.
Alerting rules can be done per service and per service and host even. So you can have the same rules for all your time services on all hosts, or different rules per host (or class of host, etc.).
Bug is Reported. For Workaround see Comments:
>This problem don’t depend on the used edition or operating system. The commit what broke the function is https://github.com/tribe29/checkmk/commit/7d7abee2d1e220f818f4ed46f7fd2649f16d4733 2
>The problem is, if you use the “Save and Test” button there are no variables starting with “vs_host” and “vs_rules” but with the commit the “vs_rules” variables are mandatory.
>Workaround (not really) - Press the “Save and Test” and then you can use directly the “Test” button on the test page. This button then also generates and transmit the missing parameters.
Reply
>Thanks for the hint. Werk #11259 (https://checkmk.com/check_mk-werks.php?werk_id=11259 3) will fix this issue in version 1.6.0p17.
>If you like, you can use all the following nightly builds, there the fix will be included.
Check_mk CRE https://checkmk.com/download.php?edition=cre&version=stable
Web based, nice GUI, it includes a docker plugin that will detect and check the containers, the notification system is very easy to configure and notifies downtime by default
Am I missing something here? https://checkmk.com/editions.html shows that the Checkmk Raw is most flexible (but has a terrible 60 s timer), but Checkmk Enterprise Free is really what you'd want (with the 1 s freq)... however, the Enterprise Free is just too limited. I don't really feel like either of these are good enough per se. I need the unlimited monitoring of Raw, but having the other one dangling out there just makes one not want to use them at all since the Enterprise Free is too limited and there's no way to afford the latter options. :(
We use Checkmk
While some might say it's "complex", we found it to be one of the few tools that allowed us to do whatever we needed it to do.
I've used both the free RAW version and CEE (at work).
Hi,
you should check out SIGNL4 as it enables Checkmk to notify the mobile teams in the field or on call in real-time. Mobile workers are not able to sit in front of a Checkmk dashboard and actively monitor problems. Staff can acknowledge and take ownership for critical events that occur, alerts are escalated in case of no response, you can communicate within an alert to address a particular problem etc.
Checkmk compatibility allows you to distribute operational alerts to a mobile SIGNL4 team by using a webhook or email. Persistent notifications and acknowledgement requirements ensure that issues will be handled before it is too late.
This speeds up their response significantly and frees resources in the operations.
Try out CheckMK https://checkmk.com/. It's base is like Nagios but there are a lot of enhancements. I've used it at multiple businesses. It will do SNMP or there's an agent, plus it will also do the basics like ping, http, etc...
Make sure to have a look at Checkmk. Originally based on Nagios, it uses a very lightweight agent to collect all the basic data points about your monitored system and to run any app-specific monitoring plugins. It also support SNMP for a shit-ton of devices out of the box and you can write definitions for anything it doesn't.
The open source version is here: https://checkmk.com/open-source-monitoring.html
I have been using the enterprise version ($) at work for about 5 years, AMA.
i would suggest to use check_mk
it has autodiscovery it will detect the servcies that it needs to add to monitor it has many plugins unlike nagios you do not have to add the disk or partitions manually it will autodiscover them. for hardeare monitoring it can based on the snmp oid find the sensors to monitor cpu, temperature motherboard health.
I have no connection with check_mk but it helped me alot to be more productive when i was the sole monitoring admin for many customers. you can output the notifications in many formats, json. xml and integrate with other tools. seriously is like zabbix. nagios. prgt combined it also has graphs.. you can choose grafana or pnp4nagios.
example for synology it has all this working out of the box:
https://checkmk.com/cms_check_plugins_catalog.html
just search for synology it can already monitor disk, raid, fan, update status all you need to do is to configure snmp
You can check out Check_MK Raw edition. Its based on Nagios and you could use the RAW for Free. Also if you do not have more than 10 Hosts you could give the Enterprise Free Edition a try wich has some more Features. https://checkmk.com/editions.html
Mail,Slack,SMS,OPSgenie Notifications are also supportet.
Also you can defined what you want to be checked. If you just want to ping, just enable the Uptime check and disable the rest.
We use CheckMk (https://checkmk.com) which does agentless for Vcenter and the ESXi host (requires a read only user) and agents on the VMs. I did also enable SNMP on our esxi hosts, so that in the monitored info as well.
(Our cluster is a hyperconverged VxRail)
Checkmk Raw seemed pretty solid to me when we tried it, even though we did go with PRTG instead. If budget had been a huge issue I would have kept it going probably.
Disclaimer: I work at tribe29, the team developing Checkmk.
I think it is always difficult to say, product x is the right thing for monitoring. It very much depends on what you would like to monitor.
If you can leave a couple of words on that, it might be easier to give recommendations. For example, it will make a huge difference, if you are monitoring an infrastructure consisting of Linux servers, Windows servers, cloud infrastructure, networking, storage systems - or if you only want to monitor some specific thing in there. And if you then want to go deeper in these systems and want to monitor databases, applications etc.. This is where you will see the differences then as well.
On a side note - we are doing our best to simplify getting started with Checkmk. There is a new entry guide (https://checkmk.com/cms_intro.html), which helps in setting up your monitoring. And we are have published some introductory videos (currently only in German, because.. we are Germans, but at least we got English subtitles...).
Sounds like Check_MK is polling for a lot of data all at once. Since the CPU in these things aren't the fastest you may need to limit what you're polling for and how often. I also found this:
Great job, OP! As an alternative -- specifically for Tomcat or Jetty -- one might also consider to use Jolokia WAR as a webapp. Check_MK already has built-in support for jolokia check and you can extract a lot of useful info from MBeans.
This is incorrect (assuming you meant this is ONLY a paid for product now). There are 3 editions of Check_MK as per:
https://checkmk.com/download.php
You want:
'The free Checkmk Raw Edition (CRE)'
Two that I could think of quickly:
FrameFlow - Best Methods for Running PowerShell Scripts Using FrameFlow