Integrating Prometheus with Netbox

This worksheet gives a sketched outline on how to integrate Prometheus with other systems.

Netbox as inventory for Prometheus

So far, all your configuration of prometheus has been via editing text files in the console. Wouldn’t it be nice if there were a web user interface where you could add new targets to be scraped, instead of editing a text file?

Prometheus is very flexible and supports a range of service discovery mechanisms which allow it to retrieve a list of targets dynamically.

This exercise will outline how to connect Prometheus to Netbox, so that adding a device into Netbox enables it to be scraped by Prometheus. It will make heavy use of target relabelling, so make sure you’ve read that exercise first!

The approach outlined here is to use Prometheus’ http service discovery mechanism, which periodically queries a URL to get the list of targets to scrape. We add a plugin to Netbox which exposes the inventory in the format that Prometheus expects.

You don’t have to use Netbox for this - there are a whole range of inventory sources you could use. This is just an example of one.

Step 1: Prepare Netbox

The following steps have already been done by the instructor on the central classroom instance of Netbox (netbox.ws.nsrc.org):

Step 2: Write the API token into a file

SSH to your campus srv1 instance, and enter the prometheus container:

incus shell prometheus

Use a text editor to create a file /etc/prometheus/netbox.token with the following contents (must be exactly 40 characters):

0123456789abcdef0123456789abcdef01234567

Step 3: Test API/credentials

Now you need to test the API. Still inside the prometheus container, run the following command:

curl -gsS -H "Authorization: Token $(cat /etc/prometheus/netbox.token)" \
  "http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true" |
  python3 -mjson.tool

You should get back a list of devices like this:

[
    {
        "targets": [
            "bdr1-campusX"
        ],
        "labels": {
            "__meta_netbox_status": "active",
            "__meta_netbox_model": "Device",
            "__meta_netbox_name": "bdr1-campusX",
            "__meta_netbox_primary_ip": "100.68.1.1",
...

If you get an error or an empty list [] then talk to the instructor.

It should return all devices in the Netbox database whose status is “active” and which have a primary IP address assigned - and this format is what Prometheus needs to consume as a list of targets.

Step 4: Scraping node_exporter

Change to your prometheus config directory, and take a backup of the config as we’re about to make a bunch of changes.

cd /etc/prometheus
cp prometheus.yml prometheus.yml.bak

Edit prometheus.yml. Find the node_exporter scrape job, that is, the section starting job_name: 'node'.

Remove or comment out this section:

    file_sd_configs:
      - files:
          - /etc/prometheus/targets.d/node.yml

and replace it with:

    http_sd_configs:
      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true
        refresh_interval: 5m
        authorization:
          type: Token
          credentials_file: /etc/prometheus/netbox.token
      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/virtual-machines/?status=active&has_primary_ip=true
        refresh_interval: 5m
        authorization:
          type: Token
          credentials_file: /etc/prometheus/netbox.token

(Note that the first URL is the one we tested with curl earlier; the second does the same, but retrieves virtual machines rather than physical devices)

Find the section relabel_configs and replace it with the following:

    relabel_configs:
      # Labels which control scraping
      - source_labels: [__address__]
        target_label: instance
      - source_labels: [__meta_netbox_primary_ip4]
        regex: '(.+)'
        target_label: __address__
      - source_labels: [__meta_netbox_primary_ip6]
        regex: '(.+)'
        target_label: __address__
        replacement: '[${1}]'
      - source_labels: [__address__]
        target_label: __address__
        replacement: '${1}:9100'
      # Optional extra metadata labels
      - source_labels: [__meta_netbox_cluster_slug]
        target_label: cluster
      - source_labels: [__meta_netbox_device_type_slug]
        target_label: device_type
      - source_labels: [__meta_netbox_model]
        target_label: netbox_model
      - source_labels: [__meta_netbox_platform_slug]
        target_label: platform
      - source_labels: [__meta_netbox_role_slug]
        target_label: role
      - source_labels: [__meta_netbox_site_slug]
        target_label: site
      - source_labels: [__meta_netbox_tag_slugs]
        target_label: tags
      - source_labels: [__meta_netbox_tenant_slug]
        target_label: tenant

(an explanation is given below!)

Check your configuration is valid, and fix any errors if not:

/opt/prometheus/promtool check config /etc/prometheus/prometheus.yml

In the prometheus web interface, run the promQL query up{job="node"} and look at the values you are currently getting.

Finally, reload the configuration:

systemctl reload prometheus

Keep re-executing the up{job="node"} query in the web UI; you should see some of the labels change, to values picked up from Netbox.

Also look at “Status > Targets”, and under “node” look at the list of targets you’re scraping, and the labels which have been picked up.

Congratulations: you should now be able to remove your static list of targets /etc/prometheus/targets.d/node.yml

Step 5: Restrict node_exporter scraping

There’s a problem. At the moment, we scrape every device as if it’s running node_exporter on port 9100. But that means we’ll try to scrape switches and routers as well (when we should be using snmp_exporter for those).

The solution proposed here is to use tags. We limit scraping to only devices and VMs with tag prom_node.

Check with curl, adding tag=prom_node to the query string:

curl -gsS -H "Authorization: Token $(cat /etc/prometheus/netbox.token)" \
  "http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?tag=prom_node&status=active&has_primary_ip=true" |
  python3 -mjson.tool

Check that only devices with tag “prom_node” are shown. The tags should also be visible in the curl response:

            "__meta_netbox_tags": "prom_node",
            "__meta_netbox_tag_slugs": "prom_node",

Finally, change your scrape config in the http_sd_configs section to add this additional filter to each of the urls: e.g.

      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true

becomes

      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?tag=prom_node&status=active&has_primary_ip=true

and reload prometheus. Check “targets” in the Prometheus web UI, to check that you’re only scraping the targets with prom_node set.

In the real world, if you have Windows servers too, then you can just duplicate all this into a scrape job ‘windows’. You’d filter on tag=prom_windows, and use port 9182 instead of 9100. Apart from that it’s exactly the same.

Step 6: SNMP scraping

We can do the same for SNMP. But there’s one additional wrinkle: we need to obtain the module and auth parameters to pass to snmp_exporter (which says which MIB to scrape and which credentials to use). This may vary between devices.

To do this, we will add custom fields to Netbox: again, the instructor has already done this in the central instance.

Now you need to update your snmp scrape config in prometheus.yml, so it looks something like this:

  - job_name: 'snmp'
    file_sd_configs:
      - files:
         - /etc/prometheus/targets.d/snmp.yml
    http_sd_configs:
      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?tag=prom_snmp&status=active&has_primary_ip=true
        refresh_interval: 5m
        authorization:
          type: Token
          credentials_file: /etc/prometheus/netbox.token
      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/virtual-machines/?tag=prom_snmp&status=active&has_primary_ip=true
        refresh_interval: 5m
        authorization:
          type: Token
          credentials_file: /etc/prometheus/netbox.token
    metrics_path: /snmp
    relabel_configs:
      # Labels which control scraping
      - source_labels: [__address__]
        target_label: instance
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__meta_netbox_primary_ip]
        regex: '(.+)'
        target_label: __param_target
      - source_labels: [__meta_netbox_custom_field_snmp_module]
        target_label: __param_module
      # Multiselect is of form ['foo', 'bar'] and we need foo,bar. There is no gsub.
      - source_labels: [__param_module]
        regex: "\\['(.*)'\\]"
        target_label: __param_module
      - source_labels: [__param_module]
        regex: "(.*)', *'(.*)"
        replacement: "$1,$2"
        target_label: __param_module
      - source_labels: [__param_module]
        regex: "(.*)', *'(.*)"
        replacement: "$1,$2"
        target_label: __param_module
      - source_labels: [__meta_netbox_custom_field_snmp_auth]
        target_label: __param_auth
      - target_label: __address__
        replacement: 127.0.0.1:9116  # SNMP exporter
      # Optional extra metadata labels
      - source_labels: [__meta_netbox_cluster_slug]
        target_label: cluster
      - source_labels: [__meta_netbox_device_type_slug]
        target_label: device_type
      - source_labels: [__meta_netbox_model]
        target_label: netbox_model
      - source_labels: [__meta_netbox_platform_slug]
        target_label: platform
      - source_labels: [__meta_netbox_role_slug]
        target_label: role
      - source_labels: [__meta_netbox_site_slug]
        target_label: site
      - source_labels: [__meta_netbox_tag_slugs]
        target_label: tags
      - source_labels: [__meta_netbox_tenant_slug]
        target_label: tenant

As before: use “promtool check config” to validate your configuration, then systemctl reload prometheus

Now all your scraping of devices is controlled via the Netbox web interface. SNMP scraping is done to those devices with tag “prom_snmp” and the module to scrape is selected via the “snmp_module” custom field. Check the Status > Targets view in the prometheus web interface.

Congratulations, this exercise is complete! You are controlling which targets to scrape using node_exporter and snmp_exporter from the Netbox database.

Scraping of other targets, like for blackbox_exporter, will still be using text file configuration. It’s possible to generate those configurations automatically too of course, but you’ll need to have a suitable “source of truth” database that lists them.

Relabelling configuration explained

The relabelling rules are used to set up the labels ready to scrape, and add additional labels which will end up in all the scraped metrics. This is an explanation of the rules for job_name: 'node'

If you recall, the targets you got from scraping Netbox with curl looked like this:

[
    {
        "targets": [
            "srv1-campusX"
        ],
        "labels": {
            "__meta_netbox_status": "active",
            "__meta_netbox_model": "Device",
            "__meta_netbox_name": "srv1-campusX",
            "__meta_netbox_primary_ip": "100.68.1.130",
...

When the relabelling rules run, the __address__ label is set to the value inside the “targets” list, and all the other labels are set as shown.

After the relabelling rules have been run, __address__ is expected to contain an “address:port” value to be scraped. Also, if there is no instance label, then instance is copied from __address__. And finally, any other labels starting with double-underscore are dropped.

So, here’s what we do. First step:

      - source_labels: [__address__]
        target_label: instance

Copies the __address__ label to instance explicitly. This disables the automatic copying later. This is so that we get a nice clean value like instance="srv1-campusX", instead of a messy default like instance="100.68.1.130:9100".

      - source_labels: [__meta_netbox_primary_ip4]
        regex: '(.+)'
        target_label: __address__

If the device has a primary IPv4 address, then put it in the __address__ label. This is what we’ll use to scrape. Unless…

      - source_labels: [__meta_netbox_primary_ip6]
        regex: '(.+)'
        target_label: __address__
        replacement: '[${1}]'

…if the device has a primary IPv6 address, then overwrite the __address__ label with this value, wrapped in square brackets because that’s needed when a target is IPv6. In other words: we’ll prefer the IPv6 address if we have one.

Note that (.+) matches a string with at least one character, so doesn’t match an empty or missing value. Hence we’ll only overwrite the IPv4 address if we do have an IPv6 address.

      - source_labels: [__address__]
        target_label: __address__
        replacement: '${1}:9100'

Rewrite the __address__ label to its current value, with “:9100” appended. The implied default is regex: (.*), so the whole value is captured and made available in ${1}.

At this point, we have what’s necessary for scraping: we’ve set __address__ to point to the “host:port” to scrape, and we’ve set instance to the visible unique instance name.

Now the remaining labels. The ones supplied by the service discovery mechanism are all prefixed by double-underscore, which means they will be removed at the end of the relabelling phase, so if we want them to be visible we have to copy them to regular labels:

      # Optional extra metadata labels
      - source_labels: [__meta_netbox_cluster_slug]
        target_label: cluster
      - source_labels: [__meta_netbox_device_type_slug]
        target_label: device_type
      - source_labels: [__meta_netbox_model]
        target_label: netbox_model
      - source_labels: [__meta_netbox_platform_slug]
        target_label: platform
      - source_labels: [__meta_netbox_role_slug]
        target_label: role
      - source_labels: [__meta_netbox_site_slug]
        target_label: site
      - source_labels: [__meta_netbox_tag_slugs]
        target_label: tags
      - source_labels: [__meta_netbox_tenant_slug]
        target_label: tenant

Why do this? These remaining labels are added to all metrics which are scraped from the target. This means they are available in all the expressions using those metrics, including alerting expressions. This gives you lots of useful data for filtering, aggregating, and routing queries and alerts.

Reference: Netbox setup

In this exercise, the instructor had preconfigured the central Netbox server for your Prometheus to talk to. These are the extra steps you’d have to do if you were using your own Netbox server.

Install netbox-plugin-prometheus-sd

If you’re running Netbox which was installed from source, then installing plugins is relatively straightforward.

Edit the file /opt/netbox/local_requirements.txt (it may not exist, if so just create it), and put the following line in it:

netbox-plugin-prometheus-sd

Save and exit. Run the following commands to install the plugin:

cd /opt/netbox
. venv/bin/activate
pip install -r local_requirements.txt

This should install the python package for the plugin. Now edit /opt/netbox/netbox/netbox/configuration.py and change the PLUGINS line so that it looks like this:

PLUGINS = ['netbox_prometheus_sd']

Finally, restart Netbox:

systemctl restart netbox

This plugin has no configuration required, and no migrations to run.

If you’re running Netbox under docker, then installing plugins is more awkward - see Using Netbox Plugins on the netbox-docker wiki.

Create the API key

If you are running your own Netbox instance, it may or may not be configured to require authentication. That is, it might allow read-only access to everyone.

If authentication is required, then you’ll have to do the following steps:

Create tags

In the Netbox web interface, under “Other” create a new tag “prom_node”. Then edit your Linux devices and VMs which are running node_exporter, and add this tag to each one. (This can be applied to multiple devices at once using the bulk edit functionality).

Similarly, create a new tag “prom_snmp” and apply this to your network devices.

Create and set the snmp_auth custom field

In the Netbox web interface:

Create and set the snmp_module custom field

This is the same, except we will allow multiple selections (so that you can poll multiple MIBs on the same device)

Then for each SNMP device (router or switch), as well as adding tag “prom_snmp”, you would select an auth and one or more modules.

Note: the use of multiple selections for MIBs complicates the relabelling configuration. snmp_exporter needs a comma-separated list like if_mib,hrDevice but netbox_prometheus_sd gives ["if_mib", "hrDevice"], so relabelling has to convert the latter to the former. See this issue.