Glad you like the site, always happy to talk about this stuff!
> Like, is it a monolith or broken down into Microservices?
The frontend and backend are separate, but both are monoliths.
> If so are they containerized and deployed with docker/k8?
They are containerized (into "frontend", "api", "worker" and "scheduler", but the latter all run the same code with different config). I use Docker Swarm, but if I started again or found some time for it I would swap for K8. I've found K8 (which I use for other projects/at work) to be much more stable.
> Is it all hosted on the same server or distributed?
It is distributed across 17 servers.
> What kind of database are you using?
MongoDB, but I would probably use PostgreSQL if I was to start again.
> Is it all written in js (node) or are there other things involved (besides html/css for the website obviously)?
It's all JS/TS, a good chunk used to be PHP years ago. There's also a little bit of Python for specialized tooling, but that doesn't directly run the site.
> What kind tools do you use to monitor your service/s? (Uptime, load, statistics etc)
Self-hosted Open Distro ElasticSearch/Kibana for logging and statistics, https://uptimerobot.com/ for uptime alerts, https://healthchecks.io/ for server & backup healthchecks (load/storage space/sanity check if containers are running etc)
I tried to use DataDog and Newrelic for a little while but both are way to pricy for our volume.
> Do you have some kind of special API key? I can't imagine the default rate limit is enough for such a big site.
The rate limiting is based on IP, and the IPs of our worker servers are whitelisted and excluded from the rate limit. You can request the whitelisting if you have a use-case that requires it :)
On top of listed already - I'm using external service https://healthchecks.io/ to monitor my connection with HA. This way I can receive notification on Telegram when my internet is down (or HA died) and I'm outside. Can't do a lot with it, but at least I know something went wrong.
Full disclosure: I only backup ~1.5-2TB using Duplicacy, but I haven't had reason to doubt it's ability at higher capacities.
Anyway, all my machines are Linux so I just have a systemd service:
[Unit] Description=Duplicacy backup After=network-online.target
[Service] Type=simple WorkingDirectory=/your/duplicacy/source ExecStart=/usr/bin/duplicacy --verbose backup --stats --threads=16 Restart=on-failure RestartSec=1min StartLimitInterval=10min StartLimitBurst=3
And a systemd timer:
[Unit] Description=Run duplicacy backup hourly
[Timer] OnCalendar=hourly Persistent=true
[Install] WantedBy=timers.target
To backup my files. Inside .duplicacy/scripts/post-backup
, I have:
#!/usr/bin/env bash
curl -fsS --retry 3 https://hchk.io/REDACTED
Which just uses the HealthChecks.io service to email me if a backup doesn't occur at least once a day (for my desktop & NAS) or once a week (for my laptop).
Everything is backed up to both BackBlaze B2 and AWS S3 Glacier Deep Archive.
I am surprised I dont see a mention yet of https://healthchecks.io. This is a highly configurable web service that you HTTPS POST out to after creating the API URL. Its super effective for scripts in the cron definitions...at the end of the command definition just add && curl ...healthchecks.io url
and if your script doesnt end in success then the curl wont run and thus, ping wont make it out to POST to the service. You configure healthchecks when to expect the ping to arrive and if it doesnt within a configurable range, it will send an email
Hello, it appears you tried to put a link in a title, since most users cant click these I have placed it here for you
^I ^am ^a ^bot ^if ^you ^have ^any ^suggestions ^dm ^me
I like the "dead man switch" approach from healthchecks.io
it's really simple if up-ness of a service is what you need to know. Your background jobs running on a schedule do a GET request to a specific URL whenever a successful execution completes.
The URL has a schedule: if it doesn't receive a GET ping within the defined schedule, it's an error (leading to email, SMS or webhook). Your tasks server can also do GET to {url}/fail
to immediately trigger the error. You group URLs by tag, and invite clients to the portal for their specific URLs so they can watch the healthchecks too.
healthchecks.io is not free, but it's cheap, nice to support a small developer.
​
Did I mention how simple it is :) This is great for small-teams or one-person developers
What? That's absurd. You can absolutely have a functioning monitoring system without the need for manual human-checked reports. This is a failure of meta-monitoring in your stack. Manually checked regular reports are a red flag anti-pattern that something is very wrong with your monitoring.
For example, we have meta-monitoring rules that sanity check various components to make sure they're not missing data or getting errors.
Then we also have a heartbeat alert that fires continuously, exercising the alert delivery system to an external service that pages us if the end-to-end fails. We use Dead Man's Snitch, but there are other options like [https://healthchecks.io/](healthchecks.io).
Hi, founder of Healthchecks.io here! Just to clarify, the "100 log entries" is how many historic records of received pings are stored and shown in the dashboard. It's not the total number of pings a single check can receive – there's no upper limit on that.
I use Healthchecks.io on all of my devices: router, Home Assistant, NAS, even backup jobs.
If something goes down, it can alert you with MANY methods! You just need cron and curl!
It means your site/service is behind Cloudflare proxy, the certificate visible to Uptime Kuma (and other in internet) is valid.
There is a long-awaited feature scheduled in 1.7.0 Uptime Kuma, called Push based monitoring. Similarly to Healthchecks.io you can use your own scripts at a remote host and "ping" heartbeat URL.
For one-off cases: https://healthchecks.io/
1) Your crawler, when healthy, pings healthchecks.io on some interval, 1s, 1m 1h - doesn't matter, depends completely on the granularity you want.
2) When healthchecks.io hasn't received a ping in say 5x * interval, have it send an email.
The docs should walk you through it- incredibly simple.
> the Zabbix agents can be set to phone home
That's exactly what I need.
healthchecks.io looks nice too!
Most people here said Zabbix so I'm going to try it first,
Some of the other suggestions are a bit overkill for my needs or not exactly what I am looking for.
Thank you all anyway!
Depends on the level you are working at, but this is a whole industry.
I use https://healthchecks.io
So:
0 * * * * /foo/bar/my/script && curl -fsS --retry 3 http://my.montioring.service.com/werafoihj-wfwefw-wefoiuhas
There are tons of services like this. Dead man's snitch is another. WDT.io another...
If you need something you can run yourself, check out nagios or icinga.
maybe you can achieve this with https://github.com/healthchecks/healthchecks , it looks like you it can export the status of the jobs to prometheus https://healthchecks.io/docs/configuring_prometheus/
So what you need is something is reporting, that works in reverse, when something doesn't happen, you want to be notified.
So SQL creates its backup on a schedule right? You can create a script on DSM that runs every x hours that pulls from that certain location.
The other thing you can do is, and I'm sure you can do this, in SQLBackupAndFTP set a 'when done' action.
For example on https://healthchecks.io you can register a deadman's switch. If your SQLBackupAndFTP doesn't call the healthcheck every 2 hours (or whatever) then it'll raise an alert.
healthchecks.io or even selfhosted version. It's dumb simple and does its job for small setups. Extremly easy to setup and integrate with email or many internet messengers via webhooks.
https://healthchecks.io/ healthchecks works by providing a URL to curl on success or failure on regular intervals. They have a number of integrations as well to notify you if the service fails or succeeds.
I use https://healthchecks.io/. It can send a Slack message, WhatsApp message, or email on failures. I think custom integrations are possible. I use it for all critical periodic jobs including backups. There is a free version. But has limitations on the number of WhatsApp messages per month. There is API support to indicate start, success, and failure. Also you can host it yourself but won't get integrations which cost money like WhatsApp.
This. If your HA isn't exposed to the internet then a heartbeat monitor is the best option, and healthchecks.io offers 20 monitors in their free plan.
Then have a cron job on your HA host that curls the URL every minute/5 mins/however long.
Here's an easy solution:
Setup a restful binary sensor to point to https://healthchecks.io/
It's free, can alert (through email or other notification tools) when it hasn't checked in after so long, and will tell you if home Assistant stops processing and checking for the restful sensor.
Reminds me of healthchecks but wish had more of the controlled alerts/thresholds of healthchecks.io
For instance some endpoints return 401 cause they arent authorized but wish these werent failures...since they are 'up' just not authenticated.
Also wish the webpage itself had ability to enable logons
I've tried a few. This one is really simple: https://healthchecks.io/
You ping it via https get (visit a custom URL).
It works well for scheduled services running in Django.
healthchecks.io, curlable, has rest api, not yet fully mature, but, the fact that you can make checks from within any step of a program/script...
Also, the fact that it awaits for the check to be requested, in a specified timeframe, helps alot with false-positives.
sure, it lacks some advanced stuff for now, and it's not a replacement for what already exists, it's a friendly hand.
Just to pitch in.
And as soon as I figure out how to tweak ELK will be using that for log analysis and simple log viewer via Kibana plugin logtrail. Atm amount of incoming logs was to much for our ELK instance to handle. IO wait in the end was killing it and I don't know yet if I want to sacrifice SSD array for it... :P
​
Also resource monitors such as htop and glances comes in handy.
I just recently found out about Pushover and I'm working on setting it up with https://healthchecks.io/ to notify me if my backups fail. I do want to get it integrated into my home automation to notify me of things like when my garage door opens and things like that.
NewRelic for historic health stats of VPS instances
ElasticSearch+LogStash+Kibana for a Django app on a Heroku-type platform. It took some time to figure out what to log, what dashboards to set up, but works ok now
And my own healthchecks.io for monitoring various cron jobs
I had a problem in my summerhouse where the circuit breaker would trip for half the house, leaving the fridge without power.
Now, arriving at a summerhouse where the fridge has been unpowered for a week or two is not something I’d recommend, and I solved it by setting up a Raspberry Pi Zero on the same breaker circuit, and having it call Healthchecks.IO every 15 minutes. It’s just a simple curl call in Cron.
Healthchecks is a monitoring service, and whenever my RPi misses two checkins in a row, I get a message through pushover.net
It is true, but they may leave Organizr open all the time. As great as Monitorr was, something more real like Healthchecks.io or UptimeKuma tends to be better as it runs all the time and can notify you in a variety of ways.
Not the guy you quoted but you probably can make-do with something like
[ $(df --output=pcent /mnt/hd | tr -dc '0-9') -lt 90 ] && echo "all good"
Replace the echo with something like https://healthchecks.io and add it to a daily cron and you’re good.
This is my script, it has the following features:
tee
here though to allow for logging WHILE stuff is happening. #!/bin/sh
ping_endpoint="https://healthchecks.io/ping/<guid>"
curl -S -s -o /dev/null -m 10 --retry 5 "${ping_endpoint}/start"
result=$(my \
-long \
-running \
-command \
2>&1)
code=$?
# only enable this when doing an rsync
# code 24 occurs when rsync builds the list of files
# and before it can send it, the file is gone
# like for example a log that moves around
# we don't care
# if [ ${code} -eq 24 ];
# then
# code=0
# fi
echo "${result}"
result_trimmed=$(echo "${result}" | tail -c 10000)
curl -S -s -o /dev/null -m 10 --retry 5 --data-raw "${result_trimmed}" "${ping_endpoint}/${code}"
exit ${code}
i have an asus router with merlin firmware https://www.asuswrt-merlin.net/ That firmware allows me to load connmon https://github.com/jackyaz/connmon which then allows me to run a ping test to https://healthchecks.io/ which then emails/sms's me when the scheduled ping gets missed.
> ssh keys
I suggest you do something like the following service is doing: https://healthchecks.io/
They have a cron job that sends vital signals to the main server. You can send disk stats to your server instead.
Best of luck with your service.
PS. This is my service (https://privacybunker.io/). Ping me if you need GDPR compliance and a startup coupon at ().
Hell, here's an entire function for you to use. I added a check for this to my existing Healthchecks.io script:
https://github.com/tronyx/HealthChecks-Linux/commit/31ecb73db364b3a072c49b3900a18d7a9c6fe47f
You could use something like Healthchecks.io and setup a script that periodically grabs the status for the auth servers and, if it is anything other than operational, send a failure ping to HCIO which can then notify you via Discord. This simple bash command will get you the status from the status page:
curl -s
<code>https://status.plex.tv/</code> | grep -B4 'Authentication and API server' | grep status | awk -F= '{print $2}' | tr -d '"'
Yeah. I'd need a TCP/UDP service hosted on the public web, which would need its own monitoring. It could though function as a system uptime health check 🤔. Although, I don't know if I want sysit
to do that. healthchecks.io has been super reliable.
> I am planning to move it to run from ansible from a separate trusted machine, which will will notify me on failures.
Healthchecks.io offers a free tier that allows you to monitor up to 20 jobs. In short, healthchecks will notify you if a job fails, or misses it’s deadline (I.e. running every day, taking more than 6 hours). It can also be configured to send you weekly/monthly status emails, and supports a lot of different notification channels (email, pushover, discord, slack, and more)
Borgmatic has a hook that integrates with health checks.
I’ve been using it for a year or so. Used to selfhost it before that, but the free tier is enough for me, and less stuff to keep running.
If anyone comes across this old thread of mine, I eventually found something that covers perfectly my needs: https://healthchecks.io/
Here is the source code: https://github.com/healthchecks/healthchecks
Ok, so if anyone else is interested, I spend some time getting a nice little script put together that will check all your docker services, attempt to restart if one is down, and provide a notification in unraid about that. If also checks in with healthchecks.io as long as all the containers are running. When it doesn't check-in you can have that site send you an email.
​
#!/bin/bash
Containers=$(docker ps -a --format "{{.Names}}")
Checkin=true
for val in $Containers; do
if [ $(docker container inspect -f '{{.State.Running}}' $val) == "false" ]
then
docker container start $val
sleep 5
if [ $(docker container inspect -f '{{.State.Running}}' $val) == "false" ]
then
Checkin=false
/usr/local/emhttp/plugins/dynamix/scripts/notify -s Docker -d "$val service is down, restart failed"
else
/usr/local/emhttp/plugins/dynamix/scripts/notify -s Docker -d "$val service is down, restart succeeded"
fi
fi
done
if [ $Checkin=true ]
then
curl
<code>https://hc-ping.com/</code><your url>
fi
I'm only seeing an option for alerts on a failure.
But the cloud sync task does have a pre/post script option. So you could setup something with that. I assume there is someway to send an alert email manually so you could write a script to send an email on success.
I think a better solution would be healthchecks.io, it can absolutely do what you want. Setting it up is super easy and they have a bunch of examples that you could probably get to work with the post script option. And its even selfhostable.
Thanks for the suggestions!
Personally I don't like AWS, too confusing cost calculation. See my other comment, I ended up using a cron job running mysqldump, upload to Backblaze B2 for backups (10GB free), healthchecks.io for monitoring the cron job (free) and notifications.
I'm working on a collection of reasonably simple system healthchecks here: https://gitlab.com/jokeyrhyme/healthcheck
There's a basic "systemd has failed units" check that you could take a look at
My setup wires them up to checks at https://healthchecks.io/ which is basically a dead-man switch: if https://healthchecks.io/ stops getting pings from your system, it'll alert you via a whole bunch of different integrations (I chose to be notified via Signal)
It provides quite a bit of high level info about your network, but it sounds like you might be looking for service monitoring. Statping or Healthchecks might be what you’re looking for.
I also get the outlook.office.com URL.
I'm testing it with a simple free Microsoft (or is it Outlook? or Office? or Live? confusing!) account. Maybe the webhook format depends on the account type? I cannot read the linked article, does it say anything about that?
The reason I'm interested is I run a SaaS monitoring service that supports notifications to MS Teams. I'm wondering if I need to do something, e.g. give a heads-up notice to the users who use the Teams integration...
I have a hammer and this looks like a nail. I run a cron monitoring service https://healthchecks.io. It is not precisely what you are describing, but it is similar:
It is also open source and can be self-hosted, but that is of course significantly more work than using the hosted service. The hosted service has a free plan that lets you monitor up to 20 cron jobs / periodic tasks.
/plug :-)
I use https://healthchecks.io/. I have a cron job which runs every minute that sends a ping along with some meta data, uptime, fiber ONT signal levels etc.
You can configure an alert interval in such a way that if there is no ping in the last x minutes an alert is triggered.
Works great for me.
healthchecks.io is great, but works in reverse: services ping it at regular intervals to let it know they’re healthy.
FWIW, I use this to tell me if my Home Assistant is up by using a <code>rest</code> sensor to ping my URL every 5 minutes.
I found links in your comment that were not hyperlinked:
I did the honors for you.
^delete ^| ^information ^| ^<3
I found links in your comment that were not hyperlinked:
I did the honors for you.
^delete ^| ^information ^| ^<3
For this particular example - very little. I already use Pushover for a few other applications (Healthchecks.io, Freshping, etc) so I just prefer to have everything in one place.
https://healthchecks.io/ is self-hostable
I'm running the Linuxserver.io docker container. Some stuff I monitor via email, some via HTTP requests in cron jobs, etc.
Notifications can be via email, or push services, and webhooks. I'm currently using Teams and Discord for notifications.
https://github.com/caronc/apprise can also be used for selfhosted push notifications.
CheckCentral is pretty awesome and support is very responsive. Its a service you don't know you need until you do! We run all our backups into it then pull info with the API into our own dashboards.
If you really want to roll your own system then https://healthchecks.io/ has an opensource version.
I found links in your comment that were not hyperlinked:
I did the honors for you.
^delete ^| ^information ^| ^<3
This worked for me:
<img id="healthCheckSVG" src="https://healthchecks.io/badge/138eb523-5089-452e-9915-8a9e1f/V9zdKoCH/WSL2Docker.svg">
<script> function refresh(node) { var times = 3000; // gap in Milli Seconds;
(function startRefresh() { var address; if(node.src.indexOf('?')>-1) address = node.src.split('?')[0]; else address = node.src;
node.src = address+"?time="+new Date().getTime(); setTimeout(startRefresh,times); })(); }
window.onload = function() { var node = document.getElementById('healthCheckSVG'); refresh(node); // you can refresh as many images you want just repeat above steps }
</script>
As the other comment mentions, this is likely the wrong sub for this.
> Clearly, I could write my Python code to take this into account and write the data to a local CSV file and then post the stored data when the link comes back up
This is the correct way to do this. Check for success and if none received, have the script try again after a delay. Have an additional script check to make sure your original script did its job and if not, send you an alert email. You can automate this with:
I found links in your comment that were not hyperlinked:
I did the honors for you.
^delete ^| ^information ^| ^<3
https://healthchecks.io is free and hooks into slack, discord, etc. Sort of similar to cronitor where you push your pings using scripts you write - I ping when systemctl status docker returns active for example.
https://uptimerobot.com also free - does all my site checking and allows you to create a public status page with custom domain. Your domain CNAME to theirs.
I found links in your comment that were not hyperlinked:
I did the honors for you.
^delete ^| ^information ^| ^<3
You could have it email a check at https://healthchecks.io/ - I use this to catch situations exactly like this. Saves filling up your inbox and makes it really easy to spot problems with routine jobs failing/crashing entirely :)
MacOS user here. I just moved off Arq completely and over to Duplicacy. I've been running Duplicacy since ~2 days into the v6 release as a "just in case," but that turned into Duplicacy being my only backup app this week.
It's fast - really fast, GUI is nice, CLI is very powerful, and it's using slightly less RAM than Arq, although Arq's usage was inline with any other app.
How I back up: MacBook Pro -> Synology. Synology -> CloudSync with GDrive AND Synology -> HyperBackup to B2 (using new S3 API). Duplicacy is managing just over 2TB of data for me.
A few things:
Overall, I'm happy with it. One day I might return to Arq but for now, I'm happy with Duplicacy. The community over there is a huge help and it's comforting to know the dev. is super active with everyone.
If that won’t work — for instance, your router may not support a WAN-side ping — you can check out Healthchecks. I use their service as a dead man’s switch for critical cron jobs (e.g. backup routines) but you can use it to monitor connectivity by pinging their service on a regular interval (like 1/min). It’s not as reliable since it’s client-side but it would work in a pinch.
Hi Pēteris,
Thank you very much for the example with explanation. It works like a charm!
I never expected the owner of Healthchecks.io to reply to my question on reddit. Keep up the great work!
I use a https://healthchecks.io/, I have acommand line sensor hit the URL every 5 minutes. This way I get alerts if HA stops firing events.
I also monitor the front end with https://uptimerobot.com
I have the following script in a cron on my router:
#!/bin/bash
pcheck() {
local phost=$1
local phck=$2
ping -c1 $phost && curl -fsS --retry 3 https://hc-ping.com/$phck > /dev/null
}
pcheck google.nl health-check-id
pcheck 192.168.1.1 another-health-check-id
If that curl is 15 minutes delayed I get a notification through https://healthchecks.io/
Thanks for the reply. I was hoping to use Borg on unRAID as well as my Linux PC. Borg is part of the 'Nerd Pack' in unRAID, so it'll get updates. BorgMatic isn't :(
So i'm trying to achieve it with Borg for now. I have heard of healthchecks.io, but in all honesty it too is beyond me. If i can't understand out how to use SystemD to email me, i doubt i'll figure out the additional layer.
I've seen BorgBase, and the Vorta app is pretty neat. Because Vorta isn't in a Community Application (or Docker) in unRAID i'd rather just use Borg for both systems, with minor tweaks in the config as appropriate. Now there's a potentially lucrative market for you :) unRAID people are often like me: somewhat tech savvy but struggle with deeper system configuration like Cron and SystemD.
If i can make a couple of suggestions for Vorta:
BorgBase looks like a good service, but i have access to another PC at my parents house so i plan to use that.
> Monitoring
> Logging — This one is pretty easy to implement, but as long as no one view or monitor theses logs, it isn’t very beneficial
So exactly the same as for the rest of your application?
> Developer Accessibility — Crontab is a system-level process
YMMV, but use containers Also:
> We wanted our developers to have easy access to both add and review our application background jobs. > Deployment — We needed to deploy code changes the same way we release new versions
Run crontab /app/config/crontab
during deployment
> What happens when one task starts when previous one hasn’t finished running yet
Don’t confuse queue systems with cron jobs then.
TL;DR; if you were using cron instead of a queue worker, your point is valid. But the argument can be made the other way around if you use a queue for periodic tasks that fit better in the crontab.
If you're doing backups with something like cron, you could call some HTTP endpoint upon successful completion. There are a bunch of services, who offer this. E.g. https://healthchecks.io
For BorgBase.com, the backup hosting service I run, you can also set alerts based on the last index modification date. This is the date, new data was added to the backup repo.
Service is pretty easy, but you should know about their reliability:
>The hosted Healthchecks.io service currently runs on Hetzner bare metal servers, with healthy excess capacity to handle traffic spikes (which cron jobs with common schedules are prone to create). The app servers are load balanced. The PostgreSQL database has a hot standby as well as daily encrypted backups to S3. The database fail-over is handled manually. The ops team consists of a single person, so multi-hour or even multi-day outages are possible.
UPDATE:
I decided Slack will be my overall monitoring, by routing everything to it.
I created a TIG (telegraf/influxdb/grafana) stack server/client running in docker that can be spun up in a few minutes, forwarding notifications to slack.
https://gitlab.com/carverhaines/tig-stack-server
Healthchecks.io forwarding notifications to slack as well
Thanks for bringing my attention to this – the Privacy Notice probably needs some of this stuff in it. I will look into improving it.
Visitor IP addresses are not being logged (except for pings, where Healthchecks *does* log the client IP addresses and show them in ping log).
Healthchecks.io uses no 3rd party tracking services like Google Analytics. It means I have only limited visibility on traffic patterns and statistics, and cannot set up conversion tracking. OTOH I don't need to have the cookie warning or load any 3rd party JS includes.
Healthchecks.io uses the following sub-processors, and have DPAs with each:
* AWS for sending emails
* Braintree for processing payments
* Cloudflare for load balancing
* Hetzner for hosting
* Twilio for sending SMS
I'm getting asked about a smaller personal plan semi-regularly, and am thinking about it. Also I'm noting your request about accepting cryptocurrencies. Not sure if recurring billing and cryptocurrencies mesh too well, but I haven't investigated this much yet.
​
​
>then I'm screwed with Duplicacy. With Duplicacy I can easy
You keep saying Duplicacy for everything, it's making things really confusing. Care to edit your comment and I'll read it again?
>Duplicacy Windows GUI and new web beta are able send only mail for all backups - failed or successful. I have to work around that with some scripting...
Ah, yes. While I agree Duplicacy doesn't have the best notification system, I personally use HealthChecks.io and a small script in .duplicacy/scripts/post-backup
which is:
#!/usr/bin/env bash
curl -fsS --retry 3 https://hc-ping.com/KEY
Duplicacy then automatically runs this after every backup, and HealthCheck.io will email me if it doesn't check in at least once a day
>It's simple, in Duplicacy if I have problem with backup data - for example missing chunk in version 1 - then backup job is probably 99% irreparable.
Not at all?
duplicacy check --tabular
, find the erroneous chunk, duplicacy cat
, find the file said chunk relates to, restore any other file without issue, no?
I've used https://healthchecks.io/ for the past year, and am super happy with it. Basically, I have a monitoring script that rips through a list of URLs, triggering pagerduty if the URL doesn't give me the correct status code (200 for public stuff, 403 for VPN protected stuff, 301 for redirects, etc.) Once the check finishes, it hits the healthchecks api endpoint. If the monitoring script dies for some reason, healthchecks triggers pagerduty if it hasn't been pinged in five minutes.
All in all, it's pretty straightforward, and took me about an hour to cobble together.
I just added a new feature in Healthchecks.io: you can now specify a "Subject Must Contain" parameter. For an incoming email, if this value is not found in the subject line, Healthchecks will ignore that email.
https://i.imgur.com/5wrYmKu.png
For both Backup Exec and Veeam, you would set the value to "Success", so Healthchecks accepts the success emails and ignores the rest.
You're welcome to try this out, and I would be interested in any and all feedback!
Yeah, Healthchecks.io can listen for incoming emails, but does not support configurable success criteria.
It would not be too hard to add a hard-coded rule like "If subject contains 'fail' then signal a failure". But I'm guessing you are working with existing, inflexible systems that each have their own email subject / body convention.
Could you give examples of the email subject lines for the services you monitor? Maybe there is a common denominator?
Instead of an external service hitting your IP address (which, for a home user, can change a lot), you could try to go the other way around i.e. ping an external IP from your internal IP. I use a similar setup wherein my home server pings external domain name and I get notifications for when the IP doesn't ping after X interval. Healthchecks.io is one such service you can use and it has a free tier too!
If any of these are cron jobs, https://healthchecks.io/ is a great way to get notified if they fail to run. You just include a curl call to an endpoint and if healthchecks doesn't receive another ping after the amount of time you configure, they'll alert you.
Send an ICMP request here every 60 seconds... (from the monitoring server).
If the remote site doesn't receive one for 5 minutes, it pings work emails, if it doesn't get one for 10 minutes it drops across to SMS alerts...
Oh god yes. I love crypto currency but i stopped watching them completely, especially if you plan to hold anyway :)
Slack integrates perfectly well with my whole stack actually. I use New Relic (or Datadog since recently) to monitor my servers, both offer native Slack integrations (and have free plans). For background jobs and cron jobs i use Healthchecks.io which also has native Slack integration. For service monitoring my go to is Uptime Robot guess the native slack integration is no surprise at this point. Because that is far from enough to not feel alone in my 1 man slack, i also pipe all interesting events in my applications (i do SaaS & Webapps) directly into Slack (with slack-notifier [rails]).
And this is only the monitoring part. Stripe, Github, Continious Integration, Support systems, nearly every aspect of my business tools provides Slack integration.
However slack is a chat, its a bad fit for keeping older information in your sight. As much as i love it for time critical/chronological things i still want a life dashboard now :D