There are a number of things you can do to secure your prometheus installation.
At the moment, node_exporter is open to everyone. Someone who kept hitting the scrape endpoint repeatedly could cause a denial-of-service. Furthermore, the traffic is not encrypted.
We could secure the node_exporter by putting a reverse proxy in front, such as apache, nginx, or exporter_exporter
However, node_exporter has TLS support built-in, so we will use this. Not only will the traffic be encrypted, but we will use TLS certificates to authenticate.
To do this properly you would set up a certificate authority. However we’re going to use a very simple config, where all the nodes use one key and certificate, and the prometheus server uses a different key and certificate.
This allows you to push out the same key and certificate on all your nodes, but only the prometheus server is authorised to scrape them.
NOTE: this requires node_exporter 1.0.0 or later
In this exercise, some of the steps need to be done in the srv1 virtual machine, and some inside the prometheus container.
To make life easier, open two ssh sessions to srv1 in separate windows.
In the first one, switch to the root user:
sudo -s
Tje prompt should look like this:
root@srv1-campusX:~#
And in the second one, enter the prometheus container:
incus shell prometheus
The prompt should look like this:
root@prometheus:~#
Both shells are for the root user. One is in srv1 (where node_exporter is running, the target which we are monitoring); and one is in prometheus (where prometheus is running, the system performing the monitoring)
Make sure you do the steps below in the correct windows!
Firstly, check that your node exporter is running as expected. The following steps are all done in the srv1 VM.
# In the SRV1 vm
curl localhost:9100/metrics
(You can also try scraping another campus’ node_exporter, to prove that it is insecure!)
Now we will create a key and certificate for node_exporter to use:
# Still in the SRV1 vm
mkdir -p /etc/node_exporter/ssl
cd /etc/node_exporter/ssl
openssl genpkey -genparam -algorithm EC -pkeyopt ec_paramgen_curve:P-256 -out p-256.param
openssl req -x509 -newkey ec:p-256.param -keyout prom_node_key.pem -out prom_node_cert.pem \
-days 29220 -nodes -subj /commonName=prom_node/ -addext "subjectAltName=DNS:prom_node"
Type ls and you should see files prom_node_cert.pem and prom_node_key.pem. This is how the node_exporter identifies itself to prometheus, i.e. the “server certificate” for a web server, with its corresponding private key. (There is also a third file with elliptic curve parameters, p-256.param)
Next, create a file /etc/node_exporter/node_web_config.yml with the following contents:
tls_server_config:
cert_file: /etc/node_exporter/ssl/prom_node_cert.pem
key_file: /etc/node_exporter/ssl/prom_node_key.pem
Next, edit /etc/default/node_exporter to add option --web.config.file=/etc/node_exporter/node_web_config.yml. For example, if it was like this:
OPTIONS='--collector.textfile.directory=/var/lib/node_exporter'
then it will become:
OPTIONS='--collector.textfile.directory=/var/lib/node_exporter --web.config.file=/etc/node_exporter/node_web_config.yml'
Restart it and check for errors:
systemctl restart node_exporter
journalctl -eu node_exporter
Check the log output includes msg="TLS is enabled." which confirms we’re running with TLS now.
Now we can do a test scrape using curl and https:
curl https://localhost:9100/metrics
This should fail because of a certificate error. It’s possible to ignore the certificate completely:
curl https://localhost:9100/metrics
… but that would accept any certificate and is insecure, so instead we should check the certificate matches the one we expect:
curl --cacert /etc/node_exporter/ssl/prom_node_cert.pem --resolve prom_node:9100:127.0.0.1 -v https://prom_node:9100/metrics
The scrape should be successful. We’ve done it over https. We’ve used the fake hostname “prom_node” to match the certificate, and told curl to use address 127.0.0.1 for the actual TCP connection, and to verify the certificate against prom_node_cert.pem
At this point, prometheus should be failing to scrape our node, because it’s still trying to use HTTP. You can check like this:
# Inside the PROMETHEUS container
/opt/prometheus/promtool query instant http://localhost:9090/prometheus 'up{job="node"}'
Look for a line in the results like this:
up{instance="srv1-campusX.ws.nsrc.org:9100", job="node"} => 0 @[1582210727.15]
The “=> 0” is the scrape result (0=fail, 1=success).
We now have to update prometheus to scrape using TLS, in the same way as we have been doing with curl.
In order to do so, prometheus will need a copy of the node_exporter certificate to be able to verify it.
# Inside the PROMETHEUS container
mkdir /etc/prometheus/ssl
# Inside the SRV1 vm
incus file push /etc/node_exporter/ssl/prom_node_cert.pem prometheus/etc/prometheus/ssl/
The incus file push copies the certificate prom_node_cert.pem from the outer host (srv1) into the filesystem in the container.
Next, check we can use this certificate to verify the node_exporter over HTTPS:
# Inside the PROMETHEUS container
curl --cacert /etc/prometheus/ssl/prom_node_cert.pem --resolve prom_node:9100:192.0.2.1 -v https://prom_node:9100/metrics
(192.0.2.1 is the local IP address of the srv1 VM, as seen from the prometheus container)
Hopefully it works. If not, debug.
Now we need to get prometheus to do the same.
Still inside the prometheus container, edit /etc/prometheus/prometheus.yml and find the section which starts job_name: 'node'. Edit it so it looks like this:
- job_name: 'node'
file_sd_configs:
- files:
- /etc/prometheus/targets.d/node.yml
scheme: https
tls_config:
# Verifying remote identity
ca_file: /etc/prometheus/ssl/prom_node_cert.pem
server_name: prom_node
The “scheme” line tells prometheus to use https instead of http (the default), and the tls_config block gives the information on how to verify the certificate.
Signal prometheus to re-read its configuration:
systemctl reload prometheus
journalctl -eu prometheus
Re-run the “promtool query” command from earlier to check the “up” metric. Within a minute, the result should change from 0 to 1. We are successfully scraping over TLS!
NOTE: if you are scraping other campus servers, these will still FAIL. This is because the other campuses are either still on HTTP, or are using a different key and certificate for node_exporter.
So far, we’ve made the scrape encrypted over TLS, but still anyone is allowed to scrape. Now we will make a new key and cert for the prometheus server to use when scraping (a “client certificate”), and configure node_exporter so that it only accepts scrapes from someone with this certificate and the matching key.
Still in the prometheus container, create the new key and cert for prometheus:
# in the PROMETHEUS container
cd /etc/prometheus/ssl
openssl genpkey -genparam -algorithm EC -pkeyopt ec_paramgen_curve:P-256 -out p-256.param
openssl req -x509 -newkey ec:p-256.param -keyout prometheus_key.pem -out prometheus_cert.pem \
-days 29220 -nodes -subj /commonName=prometheus/ -addext "subjectAltName=DNS:prometheus"
chown prometheus prometheus_key.pem
Now we need to get node_exporter to allow connections only from the holder of the private key (i.e. the prometheus server)
Go back to the SRV1 vm and copy the prometheus certificate (not the key):
# In the SRV1 vm
incus file pull prometheus/etc/prometheus/ssl/prometheus_cert.pem /etc/node_exporter/ssl/
Still on SRV1, edit /etc/node_exporter/node_web_config.yml so it looks like this:
tls_server_config:
cert_file: /etc/node_exporter/ssl/prom_node_cert.pem
key_file: /etc/node_exporter/ssl/prom_node_key.pem
client_auth_type: RequireAndVerifyClientCert
client_ca_file: /etc/node_exporter/ssl/prometheus_cert.pem
Restart node_exporter:
systemctl restart node_exporter
journalctl -eu node_exporter
Go back to the prometheus container, and re-run the exact same curl command as you did before:
# Inside the PROMETHEUS container
curl --cacert /etc/prometheus/ssl/prom_node_cert.pem --resolve prom_node:9100:192.0.2.1 -v https://prom_node:9100/metrics
You should see an error:
* TLSv1.3 (IN), TLS alert, unknown (628):
* OpenSSL SSL_read: OpenSSL/3.0.13: error:0A00045C:SSL routines::tlsv13 alert certificate required, errno 0
* Failed receiving HTTP2 data: 56(Failure when receiving data from the peer)
* Connection #0 to host prom_node left intact
curl: (56) OpenSSL SSL_read: OpenSSL/3.0.13: error:0A00045C:SSL routines::tlsv13 alert certificate required, errno 0
This is because the client isn’t presenting a certificate to the server to identify itself, and node_exporter now requires it.
To make it work, we need to give a longer curl line (split for clarity):
curl --cert /etc/prometheus/ssl/prometheus_cert.pem \
--key /etc/prometheus/ssl/prometheus_key.pem \
--cacert /etc/prometheus/ssl/prom_node_cert.pem \
--resolve prom_node:9100:192.0.2.1 \
-v https://prom_node:9100/metrics
This should now work. We’ve proved our identity to node_exporter using our private key, and it will now talk to us.
Prometheus doesn’t know about this yet though, so you should find its scrapes are failing again if you repeat the query from before:
/opt/prometheus/promtool query instant http://localhost:9090/prometheus 'up{job="node"}'
Therefore, we need to update prometheus to scrape in the same way. Still in the prometheus container, edit /etc/prometheus/prometheus.yml and update the tls_config section of the node scrape job so it looks like this:
- job_name: 'node'
file_sd_configs:
- files:
- /etc/prometheus/targets.d/node.yml
scheme: https
tls_config:
# Verifying remote identity
ca_file: /etc/prometheus/ssl/prom_node_cert.pem
server_name: prom_node
# Asserting our identity
cert_file: /etc/prometheus/ssl/prometheus_cert.pem
key_file: /etc/prometheus/ssl/prometheus_key.pem
Prometheus will start proving its identify with its own key and cert, and scraping should now start working again.
SCRAPE (HTTPS)
node_exporter <------------------------------------- prometheus
[key] ----server certificate "prom_node"---> (verify)
(verify) <---client certificate "prometheus"--- [key]
If you don’t understand what’s going on here, please talk to your instructors!
Of course, normally prometheus isn’t just scraping a single node, it’s scraping multiple remote nodes. To deploy this change to other nodes (such as host1-6 in your campus), you would copy the following files to them:
/etc/default/node_exporter/etc/node_exporter/node_web_config.yml/etc/node_exporter/ssl/prom_node_cert.pem/etc/node_exporter/ssl/prom_node_key.pem/etc/node_exporter/ssl/prometheus_cert.pembut NOT prometheus_key.pem. That file is private to the prometheus server only; it’s ownership of this key which proves the prometheus server’s identity.
With a proper CA signing the certificates, each node would have its own private key, and its own certificate with its own hostname. In this simplified setup, all the nodes share the same private key and certificate, and all with the same identity “prom_node”. This is not quite as secure as it could be (since any node with this private key could impersonate any other node), but it’s still pretty good.
Optional: if you have deployed node_exporter to your hostX virtual machines, you can update them to use TLS now.
These are some other suggested improvements, which we won’t be doing in this workshop.
node_exporter does not need root privileges to run, so you can secure it further by making it run as a non-root user. You’ll need to make sure it can read prom_node_key.pem (and that no other users can).
It’s also possible to configure node_exporter to require “basic auth” (username and password auth). The full set of options are documented here.
The standard way to secure these applications is to put them behind a HTTPS reverse proxy (e.g. apache or nginx) which:
Grafana has its own authentication and TLS capabilities, so it can be configured without a proxy.
Doing this configuration is outside the scope of this exercise. However there are a couple of things which you should remember:
--web.listen-address=127.0.0.1:9090 (or 9093 for alertmanager)one option is to have a separate virtual host for each (e.g. prometheus.example.net, alertmanager.example.net) which both point to the IP address of your reverse proxy. You’ll need to generate a TLS certificate with all the names in it.
another option is to use URL path prefixes like /prometheus and /alertmanager. If you do this, you’ll need to configure more options so that all generated URLs have the correct prefixes:
--web.external-url=https://noc.example.net/prometheus
This is what we’ve chosen to do in this workshop.