Network devices and switches don’t have prometheus exporters. But fortunately there is snmp_exporter which can convert prometheus scrapes into SNMP queries.
Do this on your campus server instance (srv1.campusX.ws.nsrc.org)
(If snmp_exporter is pre-installed, skip to the next section “Start snmp_exporter”)
Fetch and unpack the latest release from the releases page and create a symlink so that /opt/snmp_exporter
refers to the current version.
wget https://github.com/prometheus/snmp_exporter/releases/download/v0.19.0/snmp_exporter-0.19.0.linux-amd64.tar.gz
tar -C /opt -xvzf snmp_exporter-0.19.0.linux-amd64.tar.gz
ln -s snmp_exporter-0.19.0.linux-amd64 /opt/snmp_exporter
Use a text editor to create a systemd unit file /etc/systemd/system/snmp_exporter.service
with the following contents:
[Unit]
Description=Prometheus SNMP Exporter
Documentation=https://github.com/prometheus/snmp_exporter
After=network-online.target
[Service]
User=prometheus
Restart=on-failure
RestartSec=5
EnvironmentFile=/etc/default/snmp_exporter
ExecStart=/opt/snmp_exporter/snmp_exporter $OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
Tell systemd to read this new file:
systemctl daemon-reload
Also create an options file /etc/default/snmp_exporter
with the following contents:
OPTIONS='--config.file=/etc/prometheus/snmp/snmp.yml --web.listen-address=127.0.0.1:9116'
Create the initial default configuration:
mkdir /etc/prometheus/snmp
cp /opt/snmp_exporter/snmp.yml /etc/prometheus/snmp/
Let’s start snmp_exporter:
systemctl enable snmp_exporter # start on future boots
systemctl start snmp_exporter # start now
journalctl -eu snmp_exporter # check for "Listening on address" address=:9116
Use cursor keys to move around the journalctl log output, and “q” to quit. If there are any errors, then go back and fix them.
snmp_exporter’s configuration file is generated using a separate “generator” tool, whose input is the MIBs, a higher level description of requirements, and also combines the security credentials like SNMPv2/v3 keys.
Unfortunately, the “generator” tool is currently not bundled and has to be built from source, making it inconvenient to re-use.
Instead, we will modify the existing file as a workaround.
Edit /etc/prometheus/snmp/snmp.yml
Search down to the line which starts if_mib:
. This is around line 5,405. Don’t scroll by hand - use your editor’s search function!
Add &if_mib
to the end of that line, so it looks like this:
if_mib: &if_mib
Now scroll down to the very end of the file (again - don’t scroll by hand, it’s over 17,000 lines), and add the following:
if_mib_v3:
<<: *if_mib
version: 3
timeout: 3s
retries: 3
auth:
security_level: authNoPriv
username: admin
password: NetManage
auth_protocol: SHA
What this does is creates a new module “if_mib_v3” which is a copy of “if_mib” but with the security settings overridden for SNMPv3 with our credentials.
Save, and signal snmp_exporter to pick up the change:
killall -HUP snmp_exporter
journalctl -eu snmp_exporter # check for errors
If there are any errors, fix them.
Perform manual scrapes of two devices, using the following commands:
curl 'localhost:9116/snmp?module=if_mib_v3&target=gw.ws.nsrc.org'
curl 'localhost:9116/snmp?module=if_mib_v3&target=core1.campusX.nsrc.org'
Note that in each case the scrape is being sent to localhost
(where snmp_exporter is running), but it includes two parameters: module
says which MIB and credentials to use, and target
tells snmp_exporter where to send the SNMP query (this can be either a resolvable DNS name or IP address)
You should get a large number of metrics back in prometheus format, e.g.
# HELP ifHCInOctets The total number of octets received on the interface, including framing characters - 1.3.6.1.2.1.31.1.1.1.6
# TYPE ifHCInOctets counter
ifHCInOctets{ifAlias="",ifDescr="Intel Corporation Ethernet Connection (2) I219-LM",ifIndex="2",ifName="eno1"} 448744
...
# HELP sysUpTime The time (in hundredths of a second) since the network management portion of the system was last re-initialized. - 1.3.6.1.2.1.1.3
# TYPE sysUpTime gauge
sysUpTime 1.20071e+06
The comment shows the SNMP OID, but in each case it has been translated to a plain prometheus metric.
Now we are ready to move onto configuring prometheus.
Firstly, configure a targets file /etc/prometheus/targets.d/snmp.yml
containing the following:
- labels:
module: if_mib_v3
targets:
- gw.ws.nsrc.org
- bdr1.campusX.ws.nsrc.org
- core1.campusX.ws.nsrc.org
However we have a slight problem: we don’t want prometheus to scrape these targets directly. We want it to scrape the snmp_exporter on localhost and pass the target and module as parameters in the URL. To do this, we are going to need to use prometheus’ relabeling feature.
Edit /etc/prometheus/prometheus.yml
and add the following to the bottom of the scrape_configs:
section:
- job_name: 'snmp'
file_sd_configs:
- files:
- /etc/prometheus/targets.d/snmp.yml
metrics_path: /snmp
relabel_configs:
- source_labels: [__address__]
target_label: instance
- source_labels: [__address__]
target_label: __param_target
- source_labels: [module]
target_label: __param_module
- target_label: __address__
replacement: 127.0.0.1:9116 # SNMP exporter
Again, be careful with spacing. The dash before job_name
should align exactly with the dashes of earlier job_name
entries.
For details how this works, see the end of this sheet.
Now get prometheus to pick up the changes:
killall -HUP prometheus
journalctl -eu prometheus # CHECK FOR ERRORS!
Return to the prometheus web interface at http://oob.srv1.campusX.ws.nsrc.org/prometheus
Run the following queries:
up{job="snmp"}
scrape_samples_scraped{job="snmp"}
The query “up” will return 1 for all target devices - even if the SNMP query fails - because snmp_exporter itself is working. However “scrape_samples_scraped” will show the number of values retrieved; if it’s 0 then that means there was a problem with SNMP.
If there is a problem, sometimes it is helpful to use tcpdump to see the scrape attempts between prometheus and snmp_exporter:
tcpdump -i lo -nnA -s0 tcp port 9116
If scraping is successful, then you can now browse some of the values using the Console tab, for example:
ifOperStatus # this is a gauge (values 1,2 etc defined in the MIB)
ifHCInOctets # this is a counter
Can you remember how to change this counter into a rate in bits-per-second, so that you can get a traffic graph? Refer to the node_exporter exercise if you need to.
Add the border and core routers for ONE other campus in your targets file. Don’t do them all in case the 15-second polling interval overwhelms our platform.
Remember that you don’t need to HUP prometheus after updating the targets file.
When prometheus reads a target file, it puts each entry into a hidden label called __address__
. It also uses __address__
as the endpoint to scrape. After scraping, the __address__
is copied to a label called “instance” if one doesn’t exist. Finally, any label beginning with __
is removed from the result.
However, before scraping there is an optional relabeling phase, where a set of relabeling steps are applied in order. What we have done is:
- source_labels: [__address__]
target_label: instance
This copies the __address__
label to the instance
label. Therefore we end up with a label like instance="gw.ws.nsrc.org"
- source_labels: [__address__]
target_label: __param_target
We also copy the __address__
label to __param_target
; this gets applied as a parameter called “target” in the final URL
- source_labels: [module]
target_label: __param_module
Similarly, we copy the label module
(which was applied in the targets file to the group of targets) to __param_module
- target_label: __address__
replacement: 127.0.0.1:9116 # SNMP exporter
Finally, we replace __address__
with “127.0.0.1:9116”, which means that the actual scrape is sent to the snmp_exporter running on the local host. We also set metrics_path to /snmp
, instead of the default which is /metrics
, because this is what snmp_exporter requires.