This worksheet is a roadmap to some of the options for scaling prometheus.
A single prometheus node can scale quite large.
There is a calculator here which can estimate the RAM requirements for a single node. Note that additional RAM may be needed while queries are executing.
Experimentally, users have observed a single node with 12 cores and 64GB RAM ingesting 500,000 data points per second across 11 million timeseries. That is a lot of metrics!
Check for yourself how many metrics you are currently collecting:
Status and then TSDB StatusHead Stats and look at Number of Series. This gives you the number of distinct timeseries which were active over approximately the last two hours. Write down the value you see (number of series).How far are you away from 500,000 metrics per second?
The simplest way to scale up is to have multiple prometheus servers - one per datacentre, per campus, per cloud region etc.
We already have our classroom set up this way - with one prometheus server per campus. Now we just need a way to get a global view of these servers.
One option is to configure Grafana to talk to multiple prometheus servers.
Go to your grafana instance at http://oob.srv1.campusX.ws.nsrc.org/grafana
campusY (where this is a different campus to yours)http://srv1.campusY.ws.nsrc.org/prometheus (for the same remote campus)It should come back green. If not, check your work, and check that the other campus has a working prometheus instance.
You can add all the other campuses if you wish.
Now you need to modify your dashboards to be able to select these additional data sources. This is quite involved the first time you do it.
Click on “Variables”. You should see your existing variables, which may be something like this:
$instance up{job="snmp"}
$ifDescr ifIndex{instance="$instance"}$source which selects which Prometheus server you are querying. Click on “New”, then enter:
Your new $source variable will be at the bottom. Drag the “domino” control at the right to bring it to the top of the list, so your variables look like this:
$source prometheus
$instance up{job="snmp"}
$ifDescr ifIndex{instance="$instance"}$instance variable.
Prometheus to ${source}$ifDescr variable.
Prometheus to $source${source}Click “Save dashboard (floppy disk)” at the top of the page. Add a note like “Change to use $source”, and Save.
Your dashboard will now have a drop-down source selector. Choose the prometheus server at one of the other campuses, and browse their data!
As you can see, the problem with this approach is that you need to modify all your dashboards to include a $source selector - and this includes dashboards you may have imported from third parties. It can be quicker to edit the JSON form of the dashboard rather than editing every panel by hand.
Another option is to run a frontend called promxy in front of your prometheus servers. You send a query to promxy, and it sends it to the different prometheus backends and combines the query results. We will not do this in this exercise.
The advantage is that you can set up Grafana with a single prometheus data source (pointing to Promxy) and not have to configure multiple Prometheus backends or modify dashboards.
Prometheus has the ability to write to a remote database. This can be used to:
There are a number of existing integrations, and indeed a recent version of Prometheus can itself be configured as a receiver for remote writes, but this exercise is going to use one called VictoriaMetrics.
Every campus will configure their own prometheus server to write to a central VictoriaMetrics database running on the NOC.
DO NOT DO THE FOLLOWING STEPS - the instructor has already done them on noc.ws.nsrc.org. Instead skip to the next step, “Test VictoriaMetrics”
This is just for reference, to show what the instructor did to install VictoriaMetrics.
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/vX.Y.Z/victoria-metrics-vX.Y.Z.tar.gz
mkdir /opt/victoria-metrics
tar -C /opt/victoria-metrics -xvzf victoria-metrics-vX.Y.Z.tar.gz
mkdir -p /var/lib/victoria-metrics/data
chown prometheus:prometheus /var/lib/victoria-metrics/data
They have also created the init files:
==> /etc/systemd/system/victoria-metrics.service <==
[Unit]
Description=VictoriaMetrics server
Documentation=https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/Single-server-VictoriaMetrics#operation
After=network-online.target
# Fix shutdown delays: if prometheus is running on the same host,
# VictoriaMetrics should start first, and shutdown after.
Before=prometheus.service
[Service]
User=prometheus
Restart=on-failure
RestartSec=5
WorkingDirectory=/var/lib/victoria-metrics
EnvironmentFile=/etc/default/victoria-metrics
ExecStart=/opt/victoria-metrics/victoria-metrics-prod $OPTIONS
[Install]
WantedBy=multi-user.target
==> /etc/default/victoria-metrics <==
OPTIONS='-storageDataPath=/var/lib/victoria-metrics/data -retentionPeriod=6 -httpAuth.username=admin -httpAuth.password=password123 -http.pathPrefix'
(this sets a 6 months retention period, and enables simple username/password authentication)
To start:
systemctl daemon-reload
systemctl enable victoria-metrics
systemctl start victoria-metrics
VictoriaMetrics listens on port 8428 by default, and exposes a prometheus-compatible API.
Run the following command on your srv1 instance to check that you can communicate with the remote VictoriaMetrics instance running on the NOC:
/opt/prometheus/promtool query instant http://admin:password123@noc.ws.nsrc.org:8428/vmetrics up
If this is a fresh install it may return no results at all, but what’s important is that you don’t get an error.
If you try the query without the username and password, you should get a “401” (unauthorized) error.
/opt/prometheus/promtool query instant http://noc.ws.nsrc.org:8428/vmetrics up
On your srv1, edit your /etc/prometheus/prometheus.yml.
You will add an “external_labels” section under “global”. This is so that all metrics written to VictoriaMetrics will have an extra label like campus="campus1" to distinguish the metrics written from the different campuses. You will also add a remote_write section.
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
campus: campusX
# Archiving to VictoriaMetrics
remote_write:
- url: http://noc.ws.nsrc.org:8428/vmetrics/api/v1/write
basic_auth:
username: admin
password: password123
queue_config:
max_samples_per_send: 10000
max_shards: 30
... leave the rest of the file unchanged (from alertmanager configuration
... onwards)
Test your configuration:
/opt/prometheus/promtool check config /etc/prometheus/prometheus.yml
If this shows any errors, fix them. Ask for help if you need to.
When this is OK, tell prometheus to re-read its configuration, then do a final check for errors:
systemctl reload prometheus
journalctl -eu prometheus
Repeat your query to the remote VictoriaMetrics server:
/opt/prometheus/promtool query instant http://admin:password123@noc.ws.nsrc.org:8428/vmetrics up
Within a couple of minutes you should see your campus’ metrics appearing. These will have campus="campusX" as an additional label.
Getting Grafana to talk to VictoriaMetrics is just the same as you did when getting Grafana to talk to another prometheus server in another campus
Go to your grafana instance at http://oob.srv1.campusX.ws.nsrc.org/grafana
VictoriaMetricshttp://noc.ws.nsrc.org:8428/vmetricsadminpassword123It should say “Data source is working” in green (if not, ask for help)
Go to your SNMP Traffic dashboard. Select “VictoriaMetrics” as the source from the dropdown, and you should be able to see all the merged data collected from the various campuses and stored in the central VictoriaMetrics database.
This makes it very easy to do queries which span multiple campuses; and you also can be sure that any expensive queries done here will not affect the scraping done by the remote prometheus servers.
Other large-scale storage options worth looking at include Thanos and Cortex.
Thanos can store unlimited volumes of data to cheap S3 cloud storage, and performs downsampling of data which makes queries which cover long time periods much faster. It normally runs as a “sidecar” to prometheus, reading prometheus data chunks directly and uploading them to S3, although it can also act as a remote write receiver. Thanos has several components, so we are not going to set it up here, but it has a straightforward design where the components can be deployed incrementally.
Cortex is designed for huge cloud-scale, multi-tenant installations.
Another way to centralise storage is with federation. In this approach, a central prometheus server scrapes the remote prometheus servers to collect data out of them. You can limit it to scraping only selected metrics. If you wish, you can configure a larger scrape interval, so that the central server stores data at lower resolution.
Ask your instructor to set up federation on noc.ws.nsrc.org to collect data from all the campuses. They will need to add a new scrape job to prometheus.yml:
- job_name: 'federate'
scrape_interval: 2m
honor_labels: true
metrics_path: '/prometheus/federate'
params:
'match[]':
- '{job="snmp"}'
- '{job="node"}'
static_configs:
- targets:
- 'srv1.campus1.ws.nsrc.org'
- 'srv1.campus2.ws.nsrc.org'
- 'srv1.campus3.ws.nsrc.org'
- 'srv1.campus4.ws.nsrc.org'
- 'srv1.campus5.ws.nsrc.org'
- 'srv1.campus6.ws.nsrc.org'
When this is done, you should be able to access the web interface at http://noc.ws.nsrc.org/prometheus and perform queries - or add noc.ws.nsrc.org as another data source in your grafana dashboard.
By default, prometheus stores data for 15 days. You can change this by setting the configuration flag --storage.tsdb.retention.time. This setting is global and applies to all metrics.
However, prometheus’ database is not really designed for long-term storage. For long-term metric archival, you may be better off using a remote storage system such as VictoriaMetrics or Thanos.
To save storage and to speed up querying, you may also wish to store your long-term data at a lower resolution. This can be done by:
This is just information for reference.
For high availability in prometheus, simply run multiple prometheus servers scraping the same targets. You can use promxy in front of them to get a merged view: promxy will “fill in the gaps” where one server doesn’t have any data.
For high availability in alertmanager, you can run multiple alertmanagers in a cluster. You need to add flags to each alertmanager so they know about each other, and configure prometheus to talk to all alertmanagers.
If you have separate prometheus servers in multiple campuses or data centres, you might want a separate alertmanager (or alertmanager cluster) in each campus or data centre. To get a global dashboard which shows you all the alertmanagers, you can install karma.