Secure your Prometheus install

There are a number of things you can do to secure your prometheus installation.

Securing node_exporter

At the moment, node_exporter is open to everyone. Someone who kept hitting the scrape endpoint repeatedly could cause a denial-of-service. Furthermore, the traffic is not encrypted.

We could secure the node_exporter by putting a reverse proxy in front, such as apache, nginx, or exporter_exporter

However, node_exporter has TLS support built-in, so we will use this. Not only will the traffic be encrypted, but we will use TLS certificates to authenticate.

To do this properly you would set up a certificate authority. However we’re going to use a very simple config, where all the nodes use one key and certificate, and the prometheus server uses a different key and certificate.

This allows you to push out the same key and certificate on all your nodes, but only the prometheus server is authorised to scrape them.

NOTE: this requires node_exporter 1.0.0 or later

Open windows

In this exercise, some of the steps need to be done in the srv1 virtual machine, and some inside the prometheus container.

To make life easier, open two ssh sessions to srv1 in separate windows.

In the first one, switch to the root user:

sudo -s

Tje prompt should look like this:

root@srv1-campusX:~#

And in the second one, enter the prometheus container:

incus shell prometheus

The prompt should look like this:

root@prometheus:~#

Both shells are for the root user. One is in srv1 (where node_exporter is running, the target which we are monitoring); and one is in prometheus (where prometheus is running, the system performing the monitoring)

Make sure you do the steps below in the correct windows!

Enable node_exporter TLS

Firstly, check that your node exporter is running as expected. The following steps are all done in the srv1 VM.

# In the SRV1 vm
curl localhost:9100/metrics

(You can also try scraping another campus’ node_exporter, to prove that it is insecure!)

Now we will create a key and certificate for node_exporter to use:

# Still in the SRV1 vm
mkdir -p /etc/node_exporter/ssl
cd /etc/node_exporter/ssl
openssl genpkey -genparam -algorithm EC -pkeyopt ec_paramgen_curve:P-256 -out p-256.param
openssl req -x509 -newkey ec:p-256.param -keyout prom_node_key.pem -out prom_node_cert.pem \
  -days 29220 -nodes -subj /commonName=prom_node/ -addext "subjectAltName=DNS:prom_node"

Type ls and you should see files prom_node_cert.pem and prom_node_key.pem. This is how the node_exporter identifies itself to prometheus, i.e. the “server certificate” for a web server, with its corresponding private key. (There is also a third file with elliptic curve parameters, p-256.param)

Next, create a file /etc/node_exporter/node_web_config.yml with the following contents:

tls_server_config:
  cert_file: /etc/node_exporter/ssl/prom_node_cert.pem
  key_file: /etc/node_exporter/ssl/prom_node_key.pem

Next, edit /etc/default/node_exporter to add option --web.config.file=/etc/node_exporter/node_web_config.yml. For example, if it was like this:

OPTIONS='--collector.textfile.directory=/var/lib/node_exporter'

then it will become:

OPTIONS='--collector.textfile.directory=/var/lib/node_exporter --web.config.file=/etc/node_exporter/node_web_config.yml'

Restart it and check for errors:

systemctl restart node_exporter
journalctl -eu node_exporter

Check the log output includes msg="TLS is enabled." which confirms we’re running with TLS now.

Now we can do a test scrape using curl and https:

curl https://localhost:9100/metrics

This should fail because of a certificate error. It’s possible to ignore the certificate completely:

curl https://localhost:9100/metrics

… but that would accept any certificate and is insecure, so instead we should check the certificate matches the one we expect:

curl --cacert /etc/node_exporter/ssl/prom_node_cert.pem --resolve prom_node:9100:127.0.0.1 -v https://prom_node:9100/metrics

The scrape should be successful. We’ve done it over https. We’ve used the fake hostname “prom_node” to match the certificate, and told curl to use address 127.0.0.1 for the actual TCP connection, and to verify the certificate against prom_node_cert.pem

Updating prometheus to use TLS

At this point, prometheus should be failing to scrape our node, because it’s still trying to use HTTP. You can check like this:

# Inside the PROMETHEUS container
/opt/prometheus/promtool query instant http://localhost:9090/prometheus 'up{job="node"}'

Look for a line in the results like this:

up{instance="srv1-campusX.ws.nsrc.org:9100", job="node"} => 0 @[1582210727.15]

The “=> 0” is the scrape result (0=fail, 1=success).

We now have to update prometheus to scrape using TLS, in the same way as we have been doing with curl.

In order to do so, prometheus will need a copy of the node_exporter certificate to be able to verify it.

# Inside the PROMETHEUS container
mkdir /etc/prometheus/ssl

# Inside the SRV1 vm
incus file push /etc/node_exporter/ssl/prom_node_cert.pem prometheus/etc/prometheus/ssl/

The incus file push copies the certificate prom_node_cert.pem from the outer host (srv1) into the filesystem in the container.

Next, check we can use this certificate to verify the node_exporter over HTTPS:

# Inside the PROMETHEUS container
curl --cacert /etc/prometheus/ssl/prom_node_cert.pem --resolve prom_node:9100:192.0.2.1 -v https://prom_node:9100/metrics

(192.0.2.1 is the local IP address of the srv1 VM, as seen from the prometheus container)

Hopefully it works. If not, debug.

Now we need to get prometheus to do the same.

Still inside the prometheus container, edit /etc/prometheus/prometheus.yml and find the section which starts job_name: 'node'. Edit it so it looks like this:

  - job_name: 'node'
    file_sd_configs:
      - files:
          - /etc/prometheus/targets.d/node.yml
    scheme: https
    tls_config:
      # Verifying remote identity
      ca_file: /etc/prometheus/ssl/prom_node_cert.pem
      server_name: prom_node

The “scheme” line tells prometheus to use https instead of http (the default), and the tls_config block gives the information on how to verify the certificate.

Signal prometheus to re-read its configuration:

systemctl reload prometheus
journalctl -eu prometheus

Re-run the “promtool query” command from earlier to check the “up” metric. Within a minute, the result should change from 0 to 1. We are successfully scraping over TLS!

NOTE: if you are scraping other campus servers, these will still FAIL. This is because the other campuses are either still on HTTP, or are using a different key and certificate for node_exporter.

Client authentication

So far, we’ve made the scrape encrypted over TLS, but still anyone is allowed to scrape. Now we will make a new key and cert for the prometheus server to use when scraping (a “client certificate”), and configure node_exporter so that it only accepts scrapes from someone with this certificate and the matching key.

Still in the prometheus container, create the new key and cert for prometheus:

# in the PROMETHEUS container
cd /etc/prometheus/ssl
openssl genpkey -genparam -algorithm EC -pkeyopt ec_paramgen_curve:P-256 -out p-256.param
openssl req -x509 -newkey ec:p-256.param -keyout prometheus_key.pem -out prometheus_cert.pem \
  -days 29220 -nodes -subj /commonName=prometheus/ -addext "subjectAltName=DNS:prometheus"
chown prometheus prometheus_key.pem

Now we need to get node_exporter to allow connections only from the holder of the private key (i.e. the prometheus server)

Go back to the SRV1 vm and copy the prometheus certificate (not the key):

# In the SRV1 vm
incus file pull prometheus/etc/prometheus/ssl/prometheus_cert.pem /etc/node_exporter/ssl/

Still on SRV1, edit /etc/node_exporter/node_web_config.yml so it looks like this:

tls_server_config:
  cert_file: /etc/node_exporter/ssl/prom_node_cert.pem
  key_file: /etc/node_exporter/ssl/prom_node_key.pem
  client_auth_type: RequireAndVerifyClientCert
  client_ca_file: /etc/node_exporter/ssl/prometheus_cert.pem

Restart node_exporter:

systemctl restart node_exporter
journalctl -eu node_exporter

Go back to the prometheus container, and re-run the exact same curl command as you did before:

# Inside the PROMETHEUS container
curl --cacert /etc/prometheus/ssl/prom_node_cert.pem --resolve prom_node:9100:192.0.2.1 -v https://prom_node:9100/metrics

You should see an error:

* TLSv1.3 (IN), TLS alert, unknown (628):
* OpenSSL SSL_read: OpenSSL/3.0.13: error:0A00045C:SSL routines::tlsv13 alert certificate required, errno 0
* Failed receiving HTTP2 data: 56(Failure when receiving data from the peer)
* Connection #0 to host prom_node left intact
curl: (56) OpenSSL SSL_read: OpenSSL/3.0.13: error:0A00045C:SSL routines::tlsv13 alert certificate required, errno 0

This is because the client isn’t presenting a certificate to the server to identify itself, and node_exporter now requires it.

To make it work, we need to give a longer curl line (split for clarity):

curl --cert /etc/prometheus/ssl/prometheus_cert.pem \
     --key /etc/prometheus/ssl/prometheus_key.pem \
     --cacert /etc/prometheus/ssl/prom_node_cert.pem \
     --resolve prom_node:9100:192.0.2.1 \
     -v https://prom_node:9100/metrics

This should now work. We’ve proved our identity to node_exporter using our private key, and it will now talk to us.

Prometheus doesn’t know about this yet though, so you should find its scrapes are failing again if you repeat the query from before:

/opt/prometheus/promtool query instant http://localhost:9090/prometheus 'up{job="node"}'

Therefore, we need to update prometheus to scrape in the same way. Still in the prometheus container, edit /etc/prometheus/prometheus.yml and update the tls_config section of the node scrape job so it looks like this:

  - job_name: 'node'
    file_sd_configs:
      - files:
          - /etc/prometheus/targets.d/node.yml
    scheme: https
    tls_config:
      # Verifying remote identity
      ca_file: /etc/prometheus/ssl/prom_node_cert.pem
      server_name: prom_node
      # Asserting our identity
      cert_file: /etc/prometheus/ssl/prometheus_cert.pem
      key_file: /etc/prometheus/ssl/prometheus_key.pem

Prometheus will start proving its identify with its own key and cert, and scraping should now start working again.

Summary

                          SCRAPE (HTTPS)
node_exporter <------------------------------------- prometheus

[key]         ----server certificate "prom_node"--->   (verify)

(verify)      <---client certificate "prometheus"---     [key]

If you don’t understand what’s going on here, please talk to your instructors!

Deployment to other nodes

Of course, normally prometheus isn’t just scraping a single node, it’s scraping multiple remote nodes. To deploy this change to other nodes (such as host1-6 in your campus), you would copy the following files to them:

but NOT prometheus_key.pem. That file is private to the prometheus server only; it’s ownership of this key which proves the prometheus server’s identity.

With a proper CA signing the certificates, each node would have its own private key, and its own certificate with its own hostname. In this simplified setup, all the nodes share the same private key and certificate, and all with the same identity “prom_node”. This is not quite as secure as it could be (since any node with this private key could impersonate any other node), but it’s still pretty good.

Optional: if you have deployed node_exporter to your hostX virtual machines, you can update them to use TLS now.

Reference

These are some other suggested improvements, which we won’t be doing in this workshop.

Run as non-root

node_exporter does not need root privileges to run, so you can secure it further by making it run as a non-root user. You’ll need to make sure it can read prom_node_key.pem (and that no other users can).

Basic auth

It’s also possible to configure node_exporter to require “basic auth” (username and password auth). The full set of options are documented here.

Securing prometheus, alertmanager and grafana web interfaces

The standard way to secure these applications is to put them behind a HTTPS reverse proxy (e.g. apache or nginx) which:

Grafana has its own authentication and TLS capabilities, so it can be configured without a proxy.

Doing this configuration is outside the scope of this exercise. However there are a couple of things which you should remember:

Further reading