This worksheet gives a sketched outline on how to integrate Prometheus with other systems.

Netbox as inventory for Prometheus

So far, all your configuration of prometheus has been via editing text files in the console. Wouldn’t it be nice if there were a web user interface where you could add new targets to be scraped, instead of editing a text file?

Prometheus is very flexible and supports a range of service discovery mechanisms which allow it to retrieve a list of targets dynamically.

This exercise will outline how to connect Prometheus to Netbox, so that adding a device into Netbox enables it to be scraped by Prometheus. It will make heavy use of target relabelling, so make sure you’ve read that exercise first!

The approach outlined here is to use Prometheus’ http service discovery mechanism, which periodically queries a URL to get the list of targets to scrape. We add a plugin to Netbox which exposes the inventory in the format that Prometheus expects.

Step 1: Install netbox-plugin-prometheus-sd

Depending on how the instructor decides to run the exercise, they may either get you to use a central netbox instance which they manage, or get you to use your own Netbox instance on your group’s srv1.

If the exercise uses a central netbox instance then you can skip straight to step 2.

If you’re using your own netbox instance on srv1, first make sure it’s working correctly and you can access it via its web interface. Then ssh in and get a root shell.

Edit the file /opt/netbox/local_requirements.txt (it may not exist, if so just create it), and put the following line in it:

netbox-plugin-prometheus-sd

Save and exit. Run the following commands to install the plugin:

cd /opt/netbox
. venv/bin/activate
pip install -r local_requirements.txt

This should install the python package for the plugin. Now edit /opt/netbox/netbox/netbox/configuration.py and change the PLUGINS line so that it looks like this:

PLUGINS = ['netbox_prometheus_sd']

Finally, restart Netbox:

systemctl restart netbox

Step 2: Create credentials

Now you login to your prometheus server, which is most likely on srv1.

If you are running with a central Netbox instance, the instructor will give you the API key to use. Write this into a file /etc/prometheus/netbox.token and go to step 3.

If you are running your own Netbox instance, it may or may not be configured to require authentication. If it is, then you’ll have to do the following steps which the instructor can walk you through:

Go to the Admin page
Create a user ‘prometheus’ with any long random password (that you can forget immediately)
Create an API token, with write disabled
Make a note of this API token and put it in /etc/prometheus/netbox.token

Step 3: Test API/credentials

Now you need to test the API. Use the following command if using the instructor’s central Netbox instance:

curl -gsS -H "Authorization: Token $(cat /etc/prometheus/netbox.token)" \
  "http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true" |
  python3 -mjson.tool

If it’s your own instance on srv1 and doesn’t have authentication, then use:

curl -gsS "http://127.0.0.1/netbox/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true" |
  python3 -mjson.tool

You should get back a list of devices like this:

[
    {
        "targets": [
            "bdr1"
        ],
        "labels": {
            "__meta_netbox_status": "active",
            "__meta_netbox_model": "Device",
            "__meta_netbox_name": "bdr1",
            "__meta_netbox_primary_ip": "100.68.1.1",
...

If you get an empty list [] then check the following:

Do you have any devices in Netbox?
If so, find one in the web UI
Check that its status is “active”
Check that it has a primary IP address assigned

Step 4: Scraping node_exporter

Change to your prometheus config directory, and take a backup of the config as we’re about to make a bunch of changes.

cd /etc/prometheus
cp prometheus.yml prometheus.yml.bak

Edit this file. Find the node_exporter scrape job, the section starting job_name: 'node'.

Remove this section:

    file_sd_configs:
      - files:
          - /etc/prometheus/targets.d/node.yml

and replace it with:

    http_sd_configs:
      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true
        refresh_interval: 5m
        authorization:
          type: Token
          credentials_file: /etc/prometheus/netbox.token
      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/virtual-machines/?status=active&has_primary_ip=true
        refresh_interval: 5m
        authorization:
          type: Token
          credentials_file: /etc/prometheus/netbox.token

Change ‘noc.ws.nsrc.org’ to ‘127.0.0.1’ if using your own Netbox server, rather than the central class instance. (Note that the first URL is the one we tested with curl; the second does the same, but retrieves virtual machines rather than physical devices)

Find the section relabel_configs and replace it with the following:

    relabel_configs:
      # Labels which control scraping
      - source_labels: [__address__]
        target_label: instance
      - source_labels: [__meta_netbox_primary_ip4]
        regex: '(.+)'
        target_label: __address__
      - source_labels: [__meta_netbox_primary_ip6]
        regex: '(.+)'
        target_label: __address__
        replacement: '[${1}]'
      - source_labels: [__address__]
        target_label: __address__
        replacement: '${1}:9100'
      # Optional extra metadata labels
      - source_labels: [__meta_netbox_cluster_slug]
        target_label: cluster
      - source_labels: [__meta_netbox_device_type_slug]
        target_label: device_type
      - source_labels: [__meta_netbox_model]
        target_label: netbox_model
      - source_labels: [__meta_netbox_platform_slug]
        target_label: platform
      - source_labels: [__meta_netbox_role_slug]
        target_label: role
      - source_labels: [__meta_netbox_site_slug]
        target_label: site
      - source_labels: [__meta_netbox_tag_slugs]
        target_label: tags
      - source_labels: [__meta_netbox_tenant_slug]
        target_label: tenant

(an explanation is given below!)

Check your configuration is valid, and fix any errors if not:

/opt/prometheus/promtool check config /etc/prometheus/prometheus.yml

In the prometheus web interface, run the promQL query up{job="node"} and look at the values you are currently getting.

Finally, reload the configuration:

systemctl reload prometheus

Keep re-executing the up{job="node"} query in the web UI; you should see some of the labels change, to values pick up from Netbox.

Congratulations: you should now be able to remove your static list of targets /etc/prometheus/targets.d/node.yml

Step 5: Restrict node_exporter scraping

There’s a problem at the moment. At the moment, we scrape everything as if it’s running node_exporter on port 9100. But that means we’ll try to scrape switches and routers as well (when we should be using snmp_exporter for those).

The solution suggested here is to use tags. We scrape any device or VM with tag prom_node set on it.

If using the central Netbox instance, the instructor will do this for you.

If running your own Netbox instance: under “Other” create a new tag “prom_node”. Then edit your device and add this tag to that device.

Check with curl, adding tag=prom_node to the query string:

curl -gsS "http://127.0.0.1/netbox/api/plugins/prometheus-sd/devices/?tag=prom_node&status=active&has_primary_ip=true" |
  python3 -mjson.tool

Check that only devices with tag “prom_node” are shown. The tags should also be visible in the curl response:

            "__meta_netbox_tags": "prom_node",
            "__meta_netbox_tag_slugs": "prom_node",

Finally, change your scrape config in the http_sd_configs section to add this additional filter: e.g.

      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true

becomes

      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?tag=prom_node&status=active&has_primary_ip=true

and reload prometheus. Check “targets” in the Prometheus web UI, to check that you’re only scraping the targets with prom_node set.

In the real world, if you have Windows servers too, then you can just duplicate all this into a scrape job ‘windows’. You’d filter on tag=prom_windows, and use port 9182 instead of 9100. Apart from that it’s exactly the same.

Step 6: SNMP scraping

We can do the same for SNMP. But there’s one additional wrinkle: we need to obtain the module parameter to pass to snmp_exporter (which says which MIB to scrape). This may vary between devices.

To do this, we will add a custom field to Netbox: again, the instructor may have done this already in the central instance.

If you’re using your own Netbox instance, then:

Other > Custom Fields
Create a new Custom Field called “snmp_module”, type Select, with allowed values “if_mib,if_mib_v3” (or whatever modules you have defined in snmp_exporter)
Enable this Custom Field on models “DCIM > Device” and “Virtualization > Virtual Machine”
Create a new tag “prom_snmp”
In Netbox, go to a network device (router or switch), add tag “prom_snmp”, and select the module “if_mib_v3” or whatever is appropriate

Now you need to update your snmp scrape config in prometheus.yml, so it looks something like this (adjust noc.ws.nsrc.org to 127.0.0.1 if required):

  - job_name: snmp
    http_sd_configs:
      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/devices/?tag=prom_snmp&status=active&has_primary_ip=true
        refresh_interval: 5m
        authorization:
          type: Token
          credentials_file: /etc/prometheus/netbox.token
      - url: http://noc.ws.nsrc.org/netbox/api/plugins/prometheus-sd/virtual-machines/?tag=prom_snmp&status=active&has_primary_ip=true
        refresh_interval: 5m
        authorization:
          type: Token
          credentials_file: /etc/prometheus/netbox.token
    metrics_path: /snmp
    relabel_configs:
      # Labels which control scraping
      - source_labels: [__address__]
        target_label: instance
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__meta_netbox_primary_ip]
        regex: '(.+)'
        target_label: __param_target
      - source_labels: [__meta_netbox_custom_field_snmp_module]
        target_label: __param_module
      - target_label: __address__
        replacement: 127.0.0.1:9116  # SNMP exporter
      # Optional extra metadata labels
      - source_labels: [__meta_netbox_custom_field_snmp_module]
        target_label: module
      - source_labels: [__meta_netbox_cluster_slug]
        target_label: cluster
      - source_labels: [__meta_netbox_device_type_slug]
        target_label: device_type
      - source_labels: [__meta_netbox_model]
        target_label: netbox_model
      - source_labels: [__meta_netbox_platform_slug]
        target_label: platform
      - source_labels: [__meta_netbox_role_slug]
        target_label: role
      - source_labels: [__meta_netbox_site_slug]
        target_label: site
      - source_labels: [__meta_netbox_tag_slugs]
        target_label: tags
      - source_labels: [__meta_netbox_tenant_slug]
        target_label: tenant

As before: use “promtool check config” to validate your configuration, then systemctl reload prometheus

Now all your scraping of devices is controlled via the Netbox web interface. Scraping of other targets, like for blackbox_exporter, will still be using text file configuration. It’s possible to generate those configurations automatically too of course, but you’ll need to have a suitable “source of truth” database that lists them.

Relabelling configuration explained

The relabelling rules are used to set up the labels ready to scrape, and add additional labels which will end up in all the scraped metrics.

If you recall, the targets you got from scraping Netbox with curl looked like this:

[
    {
        "targets": [
            "bdr1"
        ],
        "labels": {
            "__meta_netbox_status": "active",
            "__meta_netbox_model": "Device",
            "__meta_netbox_name": "bdr1",
            "__meta_netbox_primary_ip": "100.68.1.1",
...

When the relabelling rules run, the __address__ label is set to the value inside the “targets” list, and all the othe labels are set as shown.

After the relabelling rules have been run, __address__ is expected to contain an “address:port” value to be scraped. Also, if there is no instance label, then instance is copied from __address__. And finally, any other labels starting with double-underscore are dropped.

So, here’s what we do. First step:

      - source_labels: [__address__]
        target_label: instance

Copies the __address__ label to instance explicitly. This disables the automatic copying later. This is so that we get a nice clean value like instance="bdr1", instead of a messy default like instance="100.68.1.1:9100".

      - source_labels: [__meta_netbox_primary_ip4]
        regex: '(.+)'
        target_label: __address__

If the device has a primary IPv4 address, then put it in the __address__ label. This is what we’ll use to scrape. Unless…

      - source_labels: [__meta_netbox_primary_ip6]
        regex: '(.+)'
        target_label: __address__
        replacement: '[${1}]'

…if the device has a primary IPv6 address, then overwrite the __address__ label with this value, wrapped in square brackets because that’s needed when a target is IPv6. In other words: we’ll prefer the IPv6 address if we have one.

Note that (.+) matches a string with at least one character, so doesn’t match an empty or missing value. Hence we’ll only overwrite the IPv4 address if we do have an IPv6 address.

      - source_labels: [__address__]
        target_label: __address__
        replacement: '${1}:9100'

Rewrite the __address__ label to its current value, with “:9100” appended. The implied default is regex: (.*), so the whole value is captured and made available in ${1}.

At this point, we have what’s necessary for scraping: we’ve set __address__ to point to the “host:port” to scrape, and we’ve set instance to the visible unique instance name.

Now the remaining labels. The ones supplied by the service discovery mechanism are all prefixed by double-underscore, which means they will be removed at the end of the relabelling phase, so if we want them to be visible we have to copy them to regular labels:

      # Optional extra metadata labels
      - source_labels: [__meta_netbox_custom_field_snmp_module]
        target_label: module
      - source_labels: [__meta_netbox_cluster_slug]
        target_label: cluster
      - source_labels: [__meta_netbox_device_type_slug]
        target_label: device_type
      - source_labels: [__meta_netbox_model]
        target_label: netbox_model
      - source_labels: [__meta_netbox_platform_slug]
        target_label: platform
      - source_labels: [__meta_netbox_role_slug]
        target_label: role
      - source_labels: [__meta_netbox_site_slug]
        target_label: site
      - source_labels: [__meta_netbox_tag_slugs]
        target_label: tags
      - source_labels: [__meta_netbox_tenant_slug]
        target_label: tenant

Why do this? These remaining labels are added to all metrics which are scraped from the target. This means they are available in all the expressions using those metrics, including alerting expressions. This gives you lots of useful data for filtering, aggregating, and routing queries and alerts.