We now have all this data collected in prometheus, but currently the only way to access it is by writing PromQL queries. In this exercise we’re going to add Grafana as a dashboard frontend.
Do this on your campus server instance (srv1.campusX.ws.nsrc.org)
(If grafana is pre-installed, skip to the next section “Start grafana”)
Grafana is available pre-packaged in many formats, including deb packages for Ubuntu.
curl -sS https://packages.grafana.com/gpg.key | apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" >/etc/apt/sources.list.d/grafana.list
apt-get update
apt-get install grafana
If you get error “Received HTTP code 403 from proxy after CONNECT” then this is because of how the classroom proxy is setup, and you should run:
echo "deb http://HTTPS///packages.grafana.com/oss/deb stable main" >/etc/apt/sources.list.d/grafana.list
Start grafana using:
# systemctl enable grafana-server # start on future boots
# systemctl start grafana-server # start now
# journalctl -eu grafana-server # check for "Server is ready to receive web requests."
The web interface is available at http://oob.srv1.campusX.ws.nsrc.org/grafana, or from the virtual training platform web interface, select Web > srv1 under your campus, then click on Grafana.
The initial login is “admin” and “admin”. It will prompt you to change the password - use the class password, so that everyone in your group knows it.
On the left hand side are several icons:
And lower down:
Configuration > Data Sources
http://127.0.0.1:9090/prometheus
Save & Test
You should get a green box saying “Data source is working”. If not, debug.
Dashboard definitions can be exported as JSON and shared - this means you can load in third-party dashboards. Beware they may need customizing to work in your environment, and may be of variable quality, so you may need to try several.
As an example, we will import some dashboards for Node Exporter stats.
First, go to https://grafana.com/grafana/dashboards. Under “Filter by”, enter “node exporter” and you will get a large number of matches.
These are sorted by popularity of downloads. The most popular node_exporter dashboard (i.e. the one with the most downloads) is called “Node Exporter Full” with ID 1860. (You can find the dashboard ID by hovering over the link, or by clicking on the link for more details).
Now return to your own Grafana instance at http://oob.srv1.campusX.ws.nsrc.org/grafana
A dashboard should appear, with many detailed node_exporter graphs hidden under folding sub-headings which you can expand. This dashboard is very useful for analyzing server performance.
Now let’s say you find this dashboard too complicated, so you’d like to try a simpler one.
Go back to https://grafana.com/grafana/dashboards. Again, under “Filter by”, enter “node exporter” and you will get a large number of matches.
Near the top you should find 1 Node Exporter for Prometheus Dashboard EN 20201010, and if you hover or click you should see it has ID “11074”
This seems pretty popular, so return to your grafana server and install this dashboard just as you did before.
A simpler dashboard will appear. Yay!
But on closer inspection, there are problems: for example, “Disk Space Used Basic” is showing no graphs. It’s up to you to edit the query to see what’s going on. So, next to the panel title (“Disk Space Used Basic(EXT?/XFS)”), click the down-arrow and Edit.
It turns out that the author of this dashboard has decided to limit filesystems to just these two types:
fstype=~"ext.*|xfs"
But the classroom is using btrfs, so you need to change this in all queries to
fstype=~"btrfs|ext.*|xfs"
Note that there are 3 sets of queries (C, A and B) and several places to add in “btrfs|” to the fstype label in the B query box.
Now we’ve added btrfs, you then ought to change the panel title. Edit the title in the right-hand side so it says:
【$show_hostname】:Disk Space Used Basic(BTRFS/EXT?/XFS)
Unfortunately you now get a warning:
This panel is deprecated. Please migrate to the new Table panel.
This is because this dashboard was written using an old version of Grafana (version 6), and uses an old panel which is no longer available in Grafana 9. Click the button “Migrate to Table panel” to fix this. And… it seems to work!
Once done be sure to click on Apply to save your changes and click on the floppy disk icon, upper right, to save the updated dashboard as well.
Hopefully, this exercise has given a small flavour of the sort of problems you might find when importing other people’s dashboards. They might have been written with older versions of Grafana, and they might make assumptions about your environment which aren’t true. They are a useful starting point, but may need adjusting or fixing.
We’ll now look at how you could construct your own dashboard from scratch, using snmp_exporter as an example.
Go to Dashboards > + New Dashboard
. You will see “New dashboard” with “Add a new panel”, “Add a new row”, “Add a panel from the panel library”.
Click “Add a new panel”. You will get the panel editor, which displays a graph, below it tabs for “Query / Transform / Alert”, and to the right the panel settings.
At the top just above the graph, there is a query duration which probably says “Last 6 hours”. Change this to “Last 15 minutes” so that it is easier to see recent data.
There is a block with heading “A (Prometheus)”. At the top right, there are three selections: “Explain | Builder | Code”. Click on “Code” - this lets you enter raw PromQL queries.
You should then see a section “Metrics browser >”. In this box, enter rate(ifHCInOctets[$__rate_interval])*8
and then click somewhere outside. After a few seconds, graphs should be drawn.
Note: the variable
$__rate_interval
gives the appropriate interval to calculate a prometheusrate()
between two adjacent graph points, taking into account the current zoom level. It was introduced in Grafana 7.2.
Below this, expand the section labelled “Options”. Change Legend to Custom, which will give you a text field. In this field enter {{instance}} {{ifDescr}} in
. This should simplify the labels on the legend. (Click outside the field for the change to take effect).
At the bottom, click the “+ Query” button. This will give you a query “B”. Enter rate(ifHCOutOctets[$__rate_interval])*8
in the Metrics browser field.
Again, select Options, Legend, Custom. Enter {{instance}} {{ifDescr}} out
Hover your mouse over one of the data lines in the graph: it should show the time and value of the timeseries. (If it doesn’t, click in any unused area of the left-hand panel, and try again).
Now move your attention to the right-hand area of the screen.
You will note that the type of graph selected says “Time series”. This is the type of visualization to use. Clicking the down arrow shows many more options (e.g. “Bar Chart”, “Stat”, “Gauge”) but we won’t be using these. Click on the up arrow to hide them again.
Below this you should see a section “Panel options”. It may already be expanded - if not, click on it to expand. In the “Title” field write “Traffic In/Out” and click outside. You should see the title of the panel change.
Now move to the section entitled “Graph styles”, again expand it if necessary. Under “Line interpolation” are four options: Linear, Smooth, Step before, Step after. Try clicking “Step before” and notice the difference in the graph. You can keep this, or go back to Linear or Smooth, whichever you prefer. You can also try setting “Show points” to “Never”.
In this same section, try increasing the “Fill opacity” to 10 (percent). This can be quite a pleasing effect.
Now what we’re going to do is to make the “in” and “out” graphs appear above and below the central axis.
Scroll down to and expand the section “Series overrides”, and click “+ Add field override”, then select “Fields returned by query”. Under “Fields returned by query”, select “B”.
Below this click “+ Add override property”. Select “Graph Styles > Transform”. Under “Graph Styles > Transform”, select “Negative-Y”.
What you should now see is that the inbound traffic is shown above the line, and outbound traffic below the line. You’ve told it to do this for every timeseries which was returned from the “B” query.
Next we’re going to sort out the graph labeling.
Expand the section “Standard options”. Under “Unit”, click to change to “Data rate > bits/sec(SI)”. The graph Y axis should change to showing kb/s or Mb/s.
Next, expand the section “Legend”. Set Legend Mode to Table, and Legend Placement to Right. Notice the difference in the graph.
In the top-right corner of the screen, click “Apply” to finalise the settings for this panel.
You can stretch the panel to fit the screen width, by dragging on its bottom right-hand corner. When you are happy, click the floppy disk (Save dashboard) icon: enter “SNMP Traffic” as the dashboard name, and click the blue Save button.
Note: if you want to change the name of your dashboard after the first save, then click on the cog at the very top of the screen (next to the floppy disk icon). And if you want to re-edit your panel, click to the right of the panel title (i.e. “Traffic In/Out”), where you’ll find a drop-down menu where you can select “Edit”
We now have a workable graph. The problem is that it includes all interfaces across all devices. Not only is this cluttered, but it will become unworkably slow once you have many interfaces.
We need to filter it down, and this is done using variables.
Click on the top cog (Dashboard Settings, next to the floppy disk), then select Variables from the left-hand menu.
Click “Add Variable”
instance
prometheus
label_values(ifIndex, instance)
instance
Click the “Go back” (left arrow at very top-left corner). This will show your panel again. Notice there is now an “instance” drop-down at the top, which lets you select between instances. However, it doesn’t actually have any effect on the graph yet!
Click on the panel title, where it says “Traffic In/Out”. To the right of this title is a small drop-down arrow. Click on this and select “Edit”. This brings you back into the editor.
Change your queries in the Metrics browser field for A and B: you are going to insert the string {instance="$instance"}
after each metric name, so the queries now look like this:
rate(ifHCInOctets{instance="$instance"}[$__rate_interval])*8
rate(ifHCOutOctets{instance="$instance"}[$__rate_interval])*8
Now try selecting devices on the “instance” drop-down at the top left of the screen. You’ll be switching between different devices; only the interfaces from the selected device will be shown.
Since only one device is being shown at once, you can change the Legend for Query A and B so that it only shows the interface description:
{{ifDescr}} in
{{ifDescr}} out
Click Apply, then save the dashboard - add a comment about your changes, such as “Select one instance at a time”
This is much better, but what if we only want to see one interface at a time? Then we add another variable.
ifIndex{instance="$instance"}
/ifDescr="(.*?)"/
You should now have two variables:
Variable | Definition |
---|---|
$instance |
label_values(ifIndex, instance) |
$ifDescr |
ifIndex{instance="$instance"} |
Return to the panel editor (Back via left-hand arrow in top-left corner; if necessary select Edit next to the panel title)
You are now going to change the queryes in the Metrics browser again, for both Query A and B, to include ,ifDescr=~"[[ifDescr]]"
as part of the label matching expression. They should now look like this:
rate(ifHCInOctets{instance="$instance",ifDescr=~"[[ifDescr]]"}[$__rate_interval])*8
rate(ifHCOutOctets{instance="$instance",ifDescr=~"[[ifDescr]]"}[$__rate_interval])*8
Note that because we defined ifDescr to be a “multi-valued” field, we must use regex compare (=~
) instead of equality (=
). We could have used $ifDescr
to identify the variable, but the alternative form [[ifDescr]]
avoids confusion with the special meaning of $
in a regex.
Now on the interface drop-down, you can select any single interface, combination of interfaces, or “All”, and get the graphs overlaid. Click “Apply” at the top-right when you’re happy with this.
Click “Save Dashboard” (the floppy disk at the top) to keep your changes. Again, you can optionally enter a note about what you changed, e.g. “added interface selection”.
Note: this approach relies on all your interfaces having a unique “ifDescr”. If this isn’t true, you can use “ifIndex” instead, but then the drop-down menu will show the less friendly index number instead.
For a front-page dashboard, it can be useful to show some headline stats such as the number of hosts which are up or down.
To add a new panel to your dashboard, click the icon on the top row which looks like a bar chart with a plus sign, to the left of the floppy disk. This is the “Add panel” button.
Click “Add a new panel”. In the query (next to “Metrics browser”), enter
count(up{job="snmp"})
This gives the number of devices being polled via SNMP. (Note: this counts both up and down devices; the value could by 1 or 0)
Open up the “Options” section under the query. Change the type from “Range” to “Instance”. This turns it into a single point in time query, rather than sweeping the entire selected graph range (e.g. “Last 15 minutes”)
Open the “Visualization” section (upper right, drop-down menu, next to where it currently says “Time series”). Change from “Time series” to “Stat”.
On the right, under the “Settings” section change the Title to “Number of SNMP devices”.
Under “Stat styles” set “Graph mode” to “None”. (This panel is able to display a graph of how the value has changed over the time period, but we decided only to query a single instant)
Click “Apply” at top right. Shrink your panel to make it a sensible size, by dragging on the bottom right corner.
On your new panel, next to the title, click the down-arrow and select “More… Duplicate” to get a copy of this panel. Edit it, change the query to
count(ifOperStatus != 1)
and change the title to “SNMP interfaces down”.
On the right side of the screen go down to the section “Thresholds”, and change the threshold for red to 1. This means that the value “0” will display in green, and anything else in red. Then click “Apply” at top right.
Optional: try changing the visualization to “Gauge”, and under “Standard Options” set Min 0, Max 10 (or larger if needed), so you get a speedometer-like display.
Let’s get a table of MAC addresses of our interfaces. The information is buried in the labels of “ifPhysAddress”, one time series per interface. Here is an example of the kind of metric returned:
ifPhysAddress{ifDescr="lxdbr0",ifIndex="6",ifName="lxdbr0",ifPhysAddress="FE:19:86:D7:1D:AE",instance="gw.ws.nsrc.org",job="snmp",module="if_mib1"} 1
ifPhysAddress{instance="$instance"}
Time
, __name__
, instance
, job
, module
and Value
to hide all those columns. The “eye” will be crossed out.Click “Apply” at top right. Move your interfaces table in the dashboard and resize until you like how it looks. Note that by clicking on the column headings you can sort by different columns. You can also drag the whole panel to a new location.
Optional: if you edit the panel, go to the visualization settings under “Table” and turn on “Column filter”, then you can also enable Excel-style filtering of values.
Remember to save your changes by clicking on the floppy disk icon in the upper right and adding a comment if you wish.
Unfortunately, there seem to be very few dashboards published for snmp_exporter and prometheus. You could try importing 12489 and 12492: these are simple dashboards put together by NSRC for this workshop. One shows the traffic on all interfaces overlaid on a single graph, and one shows more detailed stats for a single interface.
If you make a better one, please publish it for us to use!