This worksheet is a roadmap to some of the options for scaling prometheus.
A single prometheus node can scale quite large.
There is a calculator here which can estimate the RAM requirements for a single node. Note that additional RAM may be needed while queries are executing.
Experimentally, users have observed a single node with 12 cores and 64GB RAM ingesting 500,000 data points per second across 11 million timeseries. That is a lot of metrics!
Check for yourself how many metrics you are currently collecting:
Status and then
TSDB StatusHead Stats and look at
Number of Series. This gives you the number of distinct
timeseries which were active over approximately the last two hours.
Write down the value you see (number of series).How far are you away from 500,000 metrics per second?
The simplest way to scale up is to have multiple prometheus servers - one per datacentre, per campus, per cloud region etc.
We already have our classroom set up this way - with one prometheus server per campus. Now we just need a way to get a global view of these servers.
One option is to configure Grafana to talk to multiple prometheus servers.
Go to your grafana instance at http://oob-srv1-campusX.ws.nsrc.org/grafana
campusY (where this is a
different campus to yours)http://srv1-campusY.ws.nsrc.org/prometheus (for the same
remote campus)It should come back green. If not, check your work, and check that the other campus has a working prometheus instance.
You can add all the other campuses if you wish.
Now you need to modify your dashboards to be able to select these additional data sources. This is quite involved the first time you do it.
Go to one of your dashboards: we suggest the “SNMP Traffic” one that you created before
On the top menu, select Dashboard Settings (Cog)
Click on “Variables”. You should see your existing variables, which may be something like this:
instance label_values(ifIndex,instance)
ifDescr label_values(ifIndex{instance="$instance"},ifDescr)You are now going to create a new variable called
source which selects which Prometheus server you are
querying. Click on “New variable”, then enter:
You’re back to the list of variables
Your new source variable will be at the bottom. Drag
the “domino” control at the right to bring it to the top of the list, so
your variables look like this:
source prometheus
instance label_values(ifIndex,instance)
ifDescr label_values(ifIndex{instance="$instance"},ifDescr)Click on the instance variable.
Prometheus to ${source}Repeat for the ifDescr variable.
Prometheus to $sourceClick “Close” at the top to go back to your dashboard
For each of the widgets and graphs on the page:
${source}Click “Save dashboard (floppy disk)” at the top of the page. Add a note like “Change to use $source”, and Save.
Your dashboard will now have a drop-down source selector. Choose the prometheus server at one of the other campuses, and browse their data!
As you can see, the problem with this approach is that you need to
modify all your dashboards to include a $source
selector - and this includes dashboards you may have imported from third
parties. It can be quicker to edit the JSON form of the dashboard rather
than editing every panel by hand.
Another option is to run a frontend called promxy in front of your prometheus servers. You send a query to promxy, and it sends it to the different prometheus backends and combines the query results. We will not do this in this exercise.
The advantage is that you can set up Grafana with a single prometheus data source (pointing to Promxy) and not have to configure multiple Prometheus backends or modify dashboards.
Prometheus has the ability to write to a remote database. This can be used to:
There are a number of existing integrations, and indeed a recent version of Prometheus can itself be configured as a receiver for remote writes, but this exercise is going to use one called VictoriaMetrics.
Every campus will configure their own prometheus server to write to a central VictoriaMetrics database running on the NOC, which has already been set up by the instructors.
VictoriaMetrics listens on port 8428 by default, and exposes a
prometheus-compatible API. On the NOC, path /vmetrics is
proxied to port 8428.
Run the following command on your srv1 instance to check that you can communicate with the remote VictoriaMetrics instance running on the NOC:
/opt/prometheus/promtool query instant http://admin:password123@noc.ws.nsrc.org/vmetrics up
If this is a fresh install it may return no results at all, but what’s important is that you don’t get an error.
If you try the query without the username and password, you should get a “401” (unauthorized) error.
/opt/prometheus/promtool query instant http://noc.ws.nsrc.org/vmetrics up
On your srv1, enter the prometheus container if you’re not already there:
incus shell prometheus
Edit your /etc/prometheus/prometheus.yml.
You will add an “external_labels” section under “global”. This is so
that all metrics written to VictoriaMetrics will have an extra label
like campus="campus1" to distinguish the metrics written
from the different campuses. You will also add a remote_write
section.
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
campus: campusX
# Archiving to VictoriaMetrics
remote_write:
- url: http://noc.ws.nsrc.org/vmetrics/api/v1/write
basic_auth:
username: admin
password: password123
queue_config:
max_samples_per_send: 10000
max_shards: 30
... leave the rest of the file unchanged (from alertmanager configuration
... onwards)
Test your configuration:
/opt/prometheus/promtool check config /etc/prometheus/prometheus.yml
If this shows any errors, fix them. Ask for help if you need to.
When this is OK, tell prometheus to re-read its configuration, then do a final check for errors:
systemctl reload prometheus
journalctl -eu prometheus
Repeat your query to the remote VictoriaMetrics server:
/opt/prometheus/promtool query instant http://admin:password123@noc.ws.nsrc.org/vmetrics up
Within a couple of minutes you should see your campus’ metrics
appearing. These will have campus="campusX" as an
additional label. If there are too many to see, then filter them in the
query:
/opt/prometheus/promtool query instant http://admin:password123@noc.ws.nsrc.org/vmetrics 'up{campus="campusX"}'
Getting Grafana to talk to VictoriaMetrics is just the same as you did when getting Grafana to talk to another prometheus server in another campus
Go to your grafana instance at http://oob-srv1-campusX.ws.nsrc.org/grafana
VictoriaMetricshttp://noc.ws.nsrc.org/vmetricsadminpassword123It should say “Successfully queried the Prometheus API” in green (if not, ask for help)
Go to your SNMP Traffic dashboard. Select “VictoriaMetrics” as the source from the dropdown, and you should be able to see all the merged data collected from the various campuses and stored in the central VictoriaMetrics database.
This makes it very easy to do queries which span multiple campuses; and you also can be sure that any expensive queries done here will not affect the scraping done by the remote prometheus servers. You could also use this central storage to keep a longer-term archive.
Other large-scale storage options worth looking at include Thanos, Cortex and Mimir.
Thanos can store unlimited volumes of data to cheap S3 cloud storage, and performs downsampling of data which makes queries which cover long time periods much faster. It normally runs as a “sidecar” to prometheus, reading prometheus data chunks directly and uploading them to S3, although it can also act as a remote write receiver. Thanos has several components, so we are not going to set it up here, but it has a straightforward design where the components can be deployed incrementally.
Cortex is designed for huge cloud-scale, multi-tenant installations. It is an open-source CNCF project; Grafana Labs were the biggest contributor.
Mimir is a fork of Cortex by Grafana themselves, with a different license.
Another way to centralise storage is with federation. In this approach, a central prometheus server scrapes the remote prometheus servers to collect data out of them. You can limit it to scraping only selected metrics. If you wish, you can configure a larger scrape interval, so that the central server stores data at lower resolution.
Ask your instructor to set up federation on noc.ws.nsrc.org to collect data from all the campuses. They will need to add a new scrape job to prometheus.yml:
- job_name: 'federate'
scrape_interval: 2m
honor_labels: true
metrics_path: '/prometheus/federate'
params:
'match[]':
- '{job="snmp"}'
- '{job="node"}'
static_configs:
- targets:
- 'srv1-campus1.ws.nsrc.org'
- 'srv1-campus2.ws.nsrc.org'
- 'srv1-campus3.ws.nsrc.org'
- 'srv1-campus4.ws.nsrc.org'
- 'srv1-campus5.ws.nsrc.org'
- 'srv1-campus6.ws.nsrc.org'
When this is done, you should be able to access the web interface at http://noc.ws.nsrc.org/prometheus and perform queries - or add noc.ws.nsrc.org as another data source in your grafana dashboard.
Use up{job="federate"} as a query to check you’re seeing
data ingested using this federation job.
By default, prometheus stores data for 15 days. You can change
this by setting the configuration flag
--storage.tsdb.retention.time. This setting is global and
applies to all metrics.
However, prometheus’ database is not really designed for long-term storage. For long-term metric archival, you may be better off using a remote storage system such as VictoriaMetrics or Thanos.
To save storage and to speed up querying, you may also wish to store your long-term data at a lower resolution. This can be done by:
This is just information for reference.
For high availability in prometheus, simply run multiple prometheus servers scraping the same targets. You can use promxy in front of them to get a merged view: promxy will “fill in the gaps” where one server doesn’t have any data.
For high availability in alertmanager, you can run multiple alertmanagers in a cluster. You need to add flags to each alertmanager so they know about each other, and configure prometheus to talk to all alertmanagers.
If you have separate prometheus servers in multiple campuses or data centres, you might want a separate alertmanager (or alertmanager cluster) in each campus or data centre. To get a global dashboard which shows you all the alertmanagers, you can install karma or alerta.