This worksheet is a roadmap to some of the options for scaling prometheus.

Single node scaling

A single prometheus node can scale quite large.

There is a calculator here which can estimate the RAM requirements for a single node. Note that additional RAM may be needed while queries are executing.

Experimentally, users have observed a single node with 12 cores and 64GB RAM ingesting 500,000 data points per second across 11 million timeseries. That is a lot of metrics!

Check for yourself how many metrics you are currently collecting:

Go to the prometheus web interface at http://oob.srv1.campusX.ws.nsrc.org/prometheus
On the top click Status and then TSDB Status
Find Head Stats and look at Number of Series. This gives you the number of distinct timeseries which were active over approximately the last two hours. Write down the value you see (number of series).
Since you are ingesting at 15 second intervals, divide this number by 15 to get the number of data points ingested per second. Write this down too (metrics per second)

How far are you away from 500,000 metrics per second?

Horizontal scaling

The simplest way to scale up is to have multiple prometheus servers - one per datacentre, per campus, per cloud region etc.

We already have our classroom set up this way - with one prometheus server per campus. Now we just need a way to get a global view of these servers.

Multiple data sources in Grafana

One option is to configure Grafana to talk to multiple prometheus servers.

Go to your grafana instance at http://oob.srv1.campusX.ws.nsrc.org/grafana

On the left hand side, select Configuration (Sprocket) > Data Sources
Click “Add data source”
Under Time Series Databases click “Prometheus”. It will auto-generate a name like “Prometheus-1”
Change the name to campusY (where this is a different campus to yours)
URL: http://srv1.campusY.ws.nsrc.org/prometheus (for the same remote campus)
Click “Save and Test”

It should come back green. If not, check your work, and check that the other campus has a working prometheus instance.

You can add all the other campuses if you wish.

Modify dashboards

Now you need to modify your dashboards to be able to select these additional data sources. This is quite involved the first time you do it.

Go to one of your dashboards: we suggest the “SNMP Traffic” one that you created before
On the top menu, select Dashboard Settings (Cog)
Click on “Variables”. You should see your existing variables, which may be something like this:
```
$instance       up{job="snmp"}
$ifDescr        ifIndex{instance="$instance"}
```
You are now going to create a new variable called $source which selects which Prometheus server you are querying. Click on “New”, then enter:
- Name: “source” (all in lower case)
- Type: select “Data source” from dropdown
- Under Data Source Options: Type: select “Prometheus” from dropdown
- Click “Update”
Your new $source variable will be at the bottom. Drag the “domino” control at the right to bring it to the top of the list, so your variables look like this:
```
$source         prometheus
$instance       up{job="snmp"}
$ifDescr        ifIndex{instance="$instance"}
```
Click on the $instance variable.
- Under Query Options - Data source, click on the dropdown. Change it from Prometheus to ${source}
- Click “Update”
Repeat for the $ifDescr variable.
- Under Query Options - Data source, click on the dropdown. Change it from Prometheus to $source
- Click “Update”
Click the left-arrow at top-left corner to go back to your dashboard
For each of the widgets and graphs on the page:
- Click on the little down arrow next to the title, and select “Edit”
- Under the “Query” tab, where you see “Data source”, change it to ${source}
- Click on the left arrow on top left of screen to return to dashboard
- Repeat for all the other widgets and graphs on the page
Click “Save dashboard (floppy disk)” at the top of the page. Add a note like “Change to use $source”, and Save.

Your dashboard will now have a drop-down source selector. Choose the prometheus server at one of the other campuses, and browse their data!

As you can see, the problem with this approach is that you need to modify all your dashboards to include a $source selector - and this includes dashboards you may have imported from third parties. It can be quicker to edit the JSON form of the dashboard rather than editing every panel by hand.

Alternative: Promxy

Another option is to run a frontend called promxy in front of your prometheus servers. You send a query to promxy, and it sends it to the different prometheus backends and combines the query results. We will not do this in this exercise.

The advantage is that you can set up Grafana with a single prometheus data source (pointing to Promxy) and not have to configure multiple Prometheus backends or modify dashboards.

Remote storage

Prometheus has the ability to write to a remote database. This can be used to:

keep a centralised long term store of data
combine the outputs from multiple prometheus servers

VictoriaMetrics

There are a number of existing integrations, and indeed a recent version of Prometheus can itself be configured as a receiver for remote writes, but this exercise is going to use one called VictoriaMetrics.

Every campus will configure their own prometheus server to write to a central VictoriaMetrics database running on the NOC.

Install VictoriaMetrics

DO NOT DO THE FOLLOWING STEPS - the instructor has already done them on noc.ws.nsrc.org. Instead skip to the next step, “Test VictoriaMetrics”

This is just for reference, to show what the instructor did to install VictoriaMetrics.

wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/vX.Y.Z/victoria-metrics-vX.Y.Z.tar.gz
mkdir /opt/victoria-metrics
tar -C /opt/victoria-metrics -xvzf victoria-metrics-vX.Y.Z.tar.gz

mkdir -p /var/lib/victoria-metrics/data
chown prometheus:prometheus /var/lib/victoria-metrics/data

They have also created the init files:

==> /etc/systemd/system/victoria-metrics.service <==
[Unit]
Description=VictoriaMetrics server
Documentation=https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/Single-server-VictoriaMetrics#operation
After=network-online.target
# Fix shutdown delays: if prometheus is running on the same host,
# VictoriaMetrics should start first, and shutdown after.
Before=prometheus.service

[Service]
User=prometheus
Restart=on-failure
RestartSec=5
WorkingDirectory=/var/lib/victoria-metrics
EnvironmentFile=/etc/default/victoria-metrics
ExecStart=/opt/victoria-metrics/victoria-metrics-prod $OPTIONS

[Install]
WantedBy=multi-user.target

==> /etc/default/victoria-metrics <==
OPTIONS='-storageDataPath=/var/lib/victoria-metrics/data -retentionPeriod=6 -httpAuth.username=admin -httpAuth.password=password123 -http.pathPrefix'

(this sets a 6 months retention period, and enables simple username/password authentication)

To start:

systemctl daemon-reload
systemctl enable victoria-metrics
systemctl start victoria-metrics

Test VictoriaMetrics

VictoriaMetrics listens on port 8428 by default, and exposes a prometheus-compatible API.

Run the following command on your srv1 instance to check that you can communicate with the remote VictoriaMetrics instance running on the NOC:

/opt/prometheus/promtool query instant http://admin:password123@noc.ws.nsrc.org:8428/vmetrics up

If this is a fresh install it may return no results at all, but what’s important is that you don’t get an error.

If you try the query without the username and password, you should get a “401” (unauthorized) error.

/opt/prometheus/promtool query instant http://noc.ws.nsrc.org:8428/vmetrics up

Configure remote write

On your srv1, edit your /etc/prometheus/prometheus.yml.

You will add an “external_labels” section under “global”. This is so that all metrics written to VictoriaMetrics will have an extra label like campus="campus1" to distinguish the metrics written from the different campuses. You will also add a remote_write section.

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    campus: campusX

# Archiving to VictoriaMetrics
remote_write:
  - url: http://noc.ws.nsrc.org:8428/vmetrics/api/v1/write
    basic_auth:
      username: admin
      password: password123
    queue_config:
      max_samples_per_send: 10000
      max_shards: 30

... leave the rest of the file unchanged (from alertmanager configuration
... onwards)

Test your configuration:

/opt/prometheus/promtool check config /etc/prometheus/prometheus.yml

If this shows any errors, fix them. Ask for help if you need to.

When this is OK, tell prometheus to re-read its configuration, then do a final check for errors:

systemctl reload prometheus
journalctl -eu prometheus

Repeat your query to the remote VictoriaMetrics server:

/opt/prometheus/promtool query instant http://admin:password123@noc.ws.nsrc.org:8428/vmetrics up

Within a couple of minutes you should see your campus’ metrics appearing. These will have campus="campusX" as an additional label.

Configure grafana

Getting Grafana to talk to VictoriaMetrics is just the same as you did when getting Grafana to talk to another prometheus server in another campus

Go to your grafana instance at http://oob.srv1.campusX.ws.nsrc.org/grafana

On the left hand side, select Configuration (Sprocket) > Data Sources
Click “Add data source”
Under Time Series Databases click “Prometheus”
Name: VictoriaMetrics
URL: http://noc.ws.nsrc.org:8428/vmetrics
Basic Auth: turn on
Basic Auth details:
- Username: admin
- Password: password123
Click “Save and Test”

It should say “Data source is working” in green (if not, ask for help)

Go to your SNMP Traffic dashboard. Select “VictoriaMetrics” as the source from the dropdown, and you should be able to see all the merged data collected from the various campuses and stored in the central VictoriaMetrics database.

This makes it very easy to do queries which span multiple campuses; and you also can be sure that any expensive queries done here will not affect the scraping done by the remote prometheus servers.

Remote storage: other options

Other large-scale storage options worth looking at include Thanos and Cortex.

Thanos can store unlimited volumes of data to cheap S3 cloud storage, and performs downsampling of data which makes queries which cover long time periods much faster. It normally runs as a “sidecar” to prometheus, reading prometheus data chunks directly and uploading them to S3, although it can also act as a remote write receiver. Thanos has several components, so we are not going to set it up here, but it has a straightforward design where the components can be deployed incrementally.

Cortex is designed for huge cloud-scale, multi-tenant installations.

Federation

Another way to centralise storage is with federation. In this approach, a central prometheus server scrapes the remote prometheus servers to collect data out of them. You can limit it to scraping only selected metrics. If you wish, you can configure a larger scrape interval, so that the central server stores data at lower resolution.

Ask your instructor to set up federation on noc.ws.nsrc.org to collect data from all the campuses. They will need to add a new scrape job to prometheus.yml:

  - job_name: 'federate'
    scrape_interval: 2m

    honor_labels: true
    metrics_path: '/prometheus/federate'

    params:
      'match[]':
        - '{job="snmp"}'
        - '{job="node"}'

    static_configs:
      - targets:
          - 'srv1.campus1.ws.nsrc.org'
          - 'srv1.campus2.ws.nsrc.org'
          - 'srv1.campus3.ws.nsrc.org'
          - 'srv1.campus4.ws.nsrc.org'
          - 'srv1.campus5.ws.nsrc.org'
          - 'srv1.campus6.ws.nsrc.org'

When this is done, you should be able to access the web interface at http://noc.ws.nsrc.org/prometheus and perform queries - or add noc.ws.nsrc.org as another data source in your grafana dashboard.

Long term storage

By default, prometheus stores data for 15 days. You can change this by setting the configuration flag --storage.tsdb.retention.time. This setting is global and applies to all metrics.

However, prometheus’ database is not really designed for long-term storage. For long-term metric archival, you may be better off using a remote storage system such as VictoriaMetrics or Thanos.

To save storage and to speed up querying, you may also wish to store your long-term data at a lower resolution. This can be done by:

Using Federation, and scraping at a larger time interval
Using recording rules to generate new timeseries which are at a lower resolution - and then using remote_write with just the lower resolution timeseries
Using Thanos, which has built-in downsampling.

High availability

This is just information for reference.

For high availability in prometheus, simply run multiple prometheus servers scraping the same targets. You can use promxy in front of them to get a merged view: promxy will “fill in the gaps” where one server doesn’t have any data.

For high availability in alertmanager, you can run multiple alertmanagers in a cluster. You need to add flags to each alertmanager so they know about each other, and configure prometheus to talk to all alertmanagers.

If you have separate prometheus servers in multiple campuses or data centres, you might want a separate alertmanager (or alertmanager cluster) in each campus or data centre. To get a global dashboard which shows you all the alertmanagers, you can install karma.