Prometheus can control the labels in two main phases:
It’s also possible to change labels when alerting and in remote write, but we will not look at those here.
Run the promQL query up{job="node"}
, either in the prometheus web interface at http://oob.srv1.campusX.ws.nsrc.org/prometheus or at the command line:
/opt/prometheus/promtool query instant http://localhost:9090/prometheus 'up{job="node"}'
You should see that the instance label includes the port number 9100:
up{instance="srv1.campusX.nsrc.org:9100", job="node"} 1
^^^^^
This applies to all the metrics collected by node_exporter. This is somewhat ugly, and it also makes the instance labels harder to use in more complex queries where you are joining metrics from different sources.
We can fix this using relabeling. We’ve already seen this as part of our SNMP and blackbox configuration, but we’ll look at how to apply this to the node_exporter job.
Firstly a few important pieces of information:
__address__
__address__
is also copied to a label called instance
, unless the user has already set a label called instance
__
are removed just before scrapingWe can now improve our scrape config by doing some relabeling. Edit /etc/prometheus/prometheus.yml
and under the ‘node’ job add a relabel_configs
section as follows:
- job_name: 'node'
file_sd_configs:
- files:
- /etc/prometheus/targets.d/node.yml
# (if you have "scheme" or "tls_config" sections,
# keep them as they are). Then add:
relabel_configs:
- source_labels: [__address__]
target_label: instance
- source_labels: [__address__]
target_label: __address__
replacement: '$1:9100'
Pick up the changes and check for errors:
/opt/prometheus/promtool check config /etc/prometheus/prometheus.yml
systemctl reload prometheus
journalctl -eu prometheus
Now edit /etc/prometheus/targets.d/node.yml
and remove :9100
from each of the targets, so they look like this:
- targets:
- 'srv1.campusX.ws.nsrc.org'
Now run-run the query up{job="node"}
, either in the prometheus web interface at http://oob.srv1.campusX.ws.nsrc.org/prometheus or at the command line:
/opt/prometheus/promtool query instant http://localhost:9090/prometheus 'up{job="node"}'
Within a minute you should see results like
up{instance="noc.ws.nsrc.org", job="node"} 1
If the up status is 0
then scraping is no longer working, and you’ll need to fix the problem.
This form of labeling is highly recommended. The only problem you might find is that some third-party Grafana dashboards expect the port number to be present in the instance label, and need to be modified to work without it.
We’ve added two relabeling steps.
The first step simply copies the initial __address__
label from the targets file (which now doesn’t include the port) to the instance
label. This prevents the instance
label from being automatically set later.
The second step replaces the __address__
label with a new value. We wrote:
- source_labels: [__address__]
target_label: __address__
replacement: '$1:9100'
but we are relying on a default setting. If we wrote it out in full, it would be:
- source_labels: [__address__]
regex: '(.*)'
target_label: __address__
replacement: '$1:9100'
What this means is:
__address__
)(.*)
. This matches any sequence of zero or more characters, therefore it is always successful. The parentheses then “capture” this string into $1
__address__
In short: we’ve appended :9100 onto the end of the __address__
.
The prometheus scraper then has the complete <address>:<port>
that it needs to know where to send the http (or https) scrape to.
Suppose we wanted to label devices with a string which is not the DNS name? Or show one name in the label, but connect to an IP address to avoid DNS lookups? These are also possible.
With the next configuration, we will allow the targets file to contain data in two formats: either a bare name or IP address, or “name address” with a space between. In the second form, the “name” forms the instance label, and “address” is the IP address or DNS name that we want to connect to.
Change the relabel section under job: 'node'
to the following:
relabel_configs:
# When __address__ consists of just a name or IP address,
# copy it to the "instance" label. This keeps the port
# number out of the instance label.
- source_labels: [__address__]
regex: '([^ ]+)'
target_label: instance
# When __address__ is of the form "name address", extract
# name to "instance" label and address to "__address__"
- source_labels: [__address__]
regex: '(.+) (.+)'
target_label: instance
replacement: '${1}'
- source_labels: [__address__]
regex: '(.+) (.+)'
target_label: __address__
replacement: '${2}'
# Append port number to __address__ so that scrape gets
# sent to the right port
- source_labels: [__address__]
target_label: __address__
replacement: '${1}:9100'
When you’ve deployed this, it will still work with your existing targets.d/node.yml file, but you can also change the entries to look like this:
- targets:
- 'srv1.campusX 100.68.X.130'
You now have full control of the instance label (left-hand part) and the actual target address or hostname (right-hand part).
It is also possible to apply relabeling after the metric has been scraped. The main reason for doing this is to drop metrics which are too expensive to ingest; it’s also possible here to change labels which were returned as part of the scrape.
For example, you may have noticed that node_exporter return some internal metrics about its own Go language runtime:
go_gc_duration_seconds{quantile="0.25"} 4.0238e-05
go_gc_duration_seconds{quantile="0.5"} 5.8057e-05
go_gc_duration_seconds{quantile="0.75"} 9.9787e-05
go_gc_duration_seconds{quantile="1"} 0.008141333
go_gc_duration_seconds_sum 0.864084447
go_gc_duration_seconds_count 2641
go_goroutines 8
go_info{version="go1.13.8"} 1
go_memstats_alloc_bytes 1.693776e+06
go_memstats_alloc_bytes_total 6.14876608e+09
... etc
Let’s say you’re not interested in recording these, and you want to save some disk space in the prometheus time series database. You can drop all metrics which start with go_
like this:
metric_relabel_configs:
- source_labels: [__name__]
regex: 'go_.*'
action: drop
(In this particular case, there’s a simpler solution: run node_exporter with flag --web.disable-exporter-metrics
which disables these metrics at source)
Or suppose you want to drop all node_filesystem_xxx
metrics which have label fstype
equal to “tmpfs”, “nfs” or “nfs4”:
- source_labels: [__name__, fstype]
regex: 'node_filesystem_.*;(tmpfs|nfs|nfs4)'
action: drop
Note that when you use multiple source_labels
, the values are joined together by a separator, which by default is a semicolon - and therefore the regular expression has to match on this complete string.