In this episode, possibly the shortest since Jon joined the team, we have a conversation with Stuart (Mastodon | Twitter), who is a member of our Telegram community. We’re also missing Al.
Stuart talks about Prometheus, and compares it to Nagios. He talks about the differences between how Prometheus collects data, particularly how Prometheus talks to local exporters to collect metrics, rather than polling data every 5 minutes. He lists a collection of exporters from a whole range of products (too many to list here!) and then Jerry and Stuart discuss rewriting native data sources into a format that Prometheus works.
Stuart has linked to some additional sources of information about Prometheus:
- Prometheus Monitoring Experts – posts by Brian Brazil – adds lots of information about how prometheus works and how to make use of it
- Chris’s Wiki :: blog/sysadmin – interesting blogs, talks about their experience with prometheus.
- Grafana are heavily involved in prometheus development now, and frequently blogs about using prometheus.
- cronmanager – Generates cron statistics in prometheus format, eg job status, time to run, outputs to textfile collector directory.
- Textfile collector scripts – generates textfile outputs in Prometheus formats for a number of applications, including smartmon, apt, yum and pacman.
- Thanos and Cortex – two of the prometheus High Availability options.
Moving on with the show, we cover for the fact we’re missing Al by asking two questions on his behalf, the first covers how we believe Al is suffering from Alert Fatigue, and how he can collect results from scripts that run on his servers in a specific way. Stuart explains how he’d use Prometheus for this, Jerry mentions that he’d collect logs for later parsing and only forward logs in the case where the script has failed to run successfully. Jon mentions that he’d consider using Monit to run the tasks, as that will notify if the job fails to run. He also suggests using triggers for bash scripts to send an email on failure, and changing email titles based on the outcome of the task.
He also asks about monitoring disks on a homemade NAS. Jon mentions he’s used Monit with SmartMonTools (similar to this page) to monitor disk statuses in the past. Jerry and Stuart also mention that he could be using Prometheus for this. We also discuss that this may in fact be built into the NAS he’s trying to build. We discussed monitoring with Lucy in Episode 77.
Jon talks about the testing he’s been doing with Nebula, which is a meshed overlay VPN (Virtual Private Network) product, and compares it to a Hub-and-Spoke (or Star) VPN topology. He compares it, briefly, with ZeroTier and mentions that he needs to do more exploration into ZeroTier.
Jerry asks Stuart some questions about SaltStack, and compares it to Ansible.
As always, we’d encourage any listeners to join our Telegram Group, or contact us using the other links! We also have a Patreon which you can use to support the show if you’re so inclined.