A year (or probably two now), I converted all of my personal services into a Kubernetes cluster. We’ve been moving toward Kubernetes at work and so I wanted to know how it worked from the inside by building a cluster myself. Part of that hosting work includes hosting a site for the homeschool coop my wife leads, a web site for a weekly “geek lunch”, and also hosting our city mayor’s campaign web site. I’m not an especially huge fan of WordPress, but it does the job as long as you’re careful with it.¹

Up until now, I’ve always performed the necessary WordPress updates semi-manually. I setup a CICD system that builds my custom WordPress container image whenever WordPress releases a new patch and then sends me a text message. I then have a script I run on my laptop which installs the latest patch with a single click and then I check my WordPress sites manually to make sure they came back up. Tedious, but WordPress releases are not super often and it’s only three sites, so no biggie. However, it has been on my to-do list to setup automatic updates. The prerequisite for this, though, is monitoring so I will know when an update goes pear-shaped. As I require monitoring to succeed at another volunteer project I’ve taken on, it is now time to get that monitoring setup. Here’s what I did, in case it helps others, particularly those with micro-sized Kubernetes clusters like mine.

Setting up Prometheus

First, I based my work on the work of others. I virtually copied the work of Linux Academy’s Running Prometheus on Kubernetes, written earlier this year. It got me started in 5 minutes without really understanding anything about how Prometheus works. I will not to repeat anything you can find there.

Once this is in place, you will have a pod that will give you various statistics about your Kubernetes cluster. However, the Linux Academy article doesn’t explicitly tell you how to get at them. There are a couple options, but if you don’t intend to publish your Prometheus server through a load balancer, I suggest just using a port forward setup whenever you need to see what’s going on in Prometheus. I need to put the following into a script so I don’t have to remember it, but here’s the formula:

kubectl port-forward -n monitors deployment/prometheus-deployment 9090

(I renamed the monitoring namespace monitors for my use because reasons.) As long as that command is running, I can hit localhost port 9090 in my browser and see what Prometheus is doing.

Setting up WordPress

Okay, so now I have Prometheus monitoring Kubernetes, but that tells me nothing about WordPress yet. For that you have to understand something about Prometheus: it depends on something called an “exporter” to provide metrics to Prometheus. Basically, you need an HTTP endpoint for anything you want to monitor that will return a set of text lines describing the current state of the service.

If you look for published WordPress plugins for something to do this, you probably won’t find much. After some Googling, I came across an article on Erwin Müller’s blog titled Monitoring WordPress with Prometheus in a Kubernetes Cluster. He employs a couple different exporters, but I think one of them is redundant. I chose to just go with the second one he uses because it’s simple and I can install it straight from Github. Therefore, I forked wordpress-exporter-prometheus from origama to start. That way, I control the source code in case he decides to make some drastic change or even just remove the project from his Github account or whatever.

I added this plugin to the configuration of all of my WordPress pods and activated it in each. So now I have an endpoint in each named /wp-json/metrics that contains metrics in the format Prometheus can use. I’ve just kept it public because it’s really not too scary if someone finds out how many draft posts or total user accounts there are on these sites. However, if the metrics were secretive, I would want to add basic auth or something to them.

Then, I added the following lines to each of the service configurations for WordPress deployments in Kubernetes:

  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '80'
    prometheus.io/path: '/wp-json/metrics'

Prometheus can use these annotations to automatically discover the metrics to monitor for each service. Prometheus will (with the configuration from the blog post mentioned above) import metadata related to each such service, deployment, pod, etc. using that metadata.

Setting up all the rest

At this point, I have a working Prometheus server, a plugin for reporting metrics (including the up/down signal I am most interested in), and Prometheus is collecting these metrics. But now what? I have to do something with those metrics. Now, I need to get from here to paging me when something goes awry. I pieced the rest together from reading the Prometheus Github repositories and not-so-very-nice reference docs on the Prometheus web site. However, in the end, I did find and complete the following steps to reach my goal.

To get from here to the finish we need the following:

We need to setup rules to identify the metrics we are interested in signalling on.
We need a way to receive the pager alerts.
We need to setup Alertmanager to do the work of turning the alert signals into working pager alerts.

Writing the alert rules

The rules are part of Prometheus proper. To set these up, first you need to add something like this to your prometheus.yml configuration:

rule_files:
  - 'alerts.rules'

Rules are a way to ask Prometheus to store extra computed information about your metrics. Rules are also used to identify alerts. From here, I created my alerts.rules file like so:

groups:
- name: AppMonitors
  rules:

  - alert: CriticalDown
    expr: up{monitor_priority="critical"} == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Critical App {{ $labels.kubernetes_name }} down"
      description: "{{ $labels.kubernetes_name }} has been down for more than 5 minutes."

  - alert: ImportantDown
    expr: up{monitor_priority="important"} == 0
    for: 15m
    labels:
      severity: important
    annotations:
      summary: "Important App {{ $labels.kubernets_name }} down"
      description: "{{ $labels.kubernetes_name }} has been down for more than 15 minutes."

I am completely an amateur at this, so this is probably badly done and I actually know that something is wrong with my template variables since they are not be interpolated correctly in my alerts, but this is the gist. Despite these problems, however, it still does what I want. When something goes down, my phone nags me about it. I can fix the details later.

As an aside, you should make liberal use of labels on your services when coordinating things like Prometheus. For example, the service config for my most important WordPress sites include this label:

  labels:
    monitor_priority: critical

The alert rules above identify up metric for sites matching on these labels. If a service with these labels gets a false value, it will be added to the alert metrics and trigger an alert. After installing this configuration, I can see the status of these alerts in the Prometheus web interface. If I deliberately take a service offline, the status on the Alerts section of the web interface changes. Therefore, we now have the alerts identified.

Setting up Opsgenie

Now we’re ready for setting up how we want to receive our notifications. I am not going to use email to receive alerts. This is not 2001. My email is flooded with too much noise already and I’ll just ignore them there like I ignore 99% of my email. These alerts need to make my phone ding and annoy me until I fix them. This is where Opsgenie comes in. The two obvious picks (in my mind) were either PagerDuty or Opsgenie. However, Opsgenie has a free tier and PagerDuty does not, so Opsgenie wins. Honestly, I could probably get by with an SNS queue, but Opsgenie is easy to configure in Prometheus, so let’s go with it.

I setup a free account for myself, setup a team with myself as the sole member so I get to be the one on call all the time, and configured an integration. I copied down the API key and now I’m ready to configure Alertmanager to connect my alerts to Opsgenie.

Setting up Alertmanager

The last step is that I need a tool running in my cluster to forward the alerts from Prometheus to Opsgenie. The tool for this is called Alertmanager, which is another tool in the Prometheus ecosystem. For this, I crafted my own setup from scratch. My Kubernetes configuration for Alertmanager looks like this:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-alertmanager-conf
  labels:
    name: prometheus-alertmanager-conf
  namespace: monitors
data:
  alertmanager.yml: |
    route:
    receiver: opsgenie

    receivers:
    - name: opsgenie
    opsgenie_configs:
    - api_key: SECRET
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: alertmanager
  namespace: monitors
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: alertmanager
        software: prometheus
    spec:
      containers:
      - name: alertmanager
        image: prom/alertmanager:v0.18.0
        args:
        - "--config.file=/etc/alertmanager/alertmanager.yml"
        ports:
        - containerPort: 9093
          name: alertmanager
        volumeMounts:
        - name: prometheus-config
          mountPath: /etc/alertmanager/
      volumes:
      - name: prometheus-config
        configMap:
          defaultMode: 420
          name: prometheus-alertmanager-conf
---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager
  namespace: monitors
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: 8080
spec:
  selector:
    app: alertmanager
  ports:
  - port: 8080
    targetPort: alertmanager

That’s it. I loaded that into Kubernetes. The Alertmanager is now ready for me to send alerts through it. We can test it real quick by setting up a port forward:

kubectl port-forward -n monitors service/alertmanager 8080

And then send it a test alert via curl like so:

curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"Test"}}]' localhost:8080/api/v1/alerts

About 5 minutes after running that curl command, my phone dings to let me know an alert has been received. (The delay is because I’ve left all the group delay and other defaults in place for now.)

That’s pretty much it for Alertmanager. However, there’s one teensy little thing we need to do Prometheus to complete the configuration. Prometheus needs to push alerts to Alertmanager. This is done via the following configuration in in prometheus.yml:

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 'alertmanager.monitors.svc:8080'

After all this, I can restart my Prometheus pod and when one of my WordPress sites has down time, my phone pages me within 10-20 minutes. That’s good enough for my little teeny sites. I tested by deliberately taking one of the sites offline.

Yay!

I hope that this information is useful to someone on the Internet. If not, it will end up being useful to me in 14 months when I next go to figure out what I did to set this up and what I need to remember when I next need to work with it.

Cheers.

By being careful, I mean always keep WordPress and all plugins and themes patched and up to date and be very careful and conservative about which plugins and themes you install. ↩︎

Monitoring WordPress with Prometheus in Kubernetes

Setting up Prometheus

Setting up WordPress

Setting up all the rest

Writing the alert rules

Setting up Opsgenie

Setting up Alertmanager

TOC

Pages

Tags

Links