Grafana as dashboard front-end to Prometheus

We now have all this data collected in prometheus, but currently the only way to access it is by writing PromQL queries. In this exercise we’re going to add Grafana as a dashboard frontend.

Do this on your campus server instance (srv1-campusX.ws.nsrc.org)

Start grafana

Grafana should already be running inside the “prometheus” container. You can check this by logging into srv1, entering the prometheus container, and checking its status:

incus shell prometheus
systemctl status grafana-server
exit

Or in one line:

incus exec prometheus -- systemctl status grafana-server

The web interface is available at http://oob-srv1-campusX.ws.nsrc.org/grafana, or from the virtual training platform web interface, select Web > srv1 under your campus, then click on Grafana.

The initial login is “admin” and “admin”. It will prompt you to change the password - use the class password, so that everyone in your group knows it.

On the left hand side is a set of menu items:

The top right-hand corner has a icon representing the currently logged-in user.

Add prometheus data source

You should get a green box saying “Successfully queried the Prometheus API”. If not, debug.

Importing dashboards

Dashboard definitions can be exported as JSON and shared - this means you can load in third-party dashboards. Beware they may need customizing to work in your environment, and may be of variable quality, so you may need to try several.

As an example, we will import some dashboards for Node Exporter stats.

Node Exporter dashboard 1860

First, go to https://grafana.com/grafana/dashboards. In the “Search dashboards” box enter “node exporter” and you will get a large number of matches.

These are sorted by popularity of downloads. The most popular node_exporter dashboard (i.e. the one with the most downloads) is called “Node Exporter Full” with ID 1860. (You can find the dashboard ID by hovering over the link, or by clicking on the link for more details).

Now return to your own Grafana instance at http://oob-srv1-campusX.ws.nsrc.org/grafana

A dashboard should appear, with many detailed node_exporter graphs hidden under folding sub-headings which you can expand. This dashboard is very useful for analyzing server performance.

Node Exporter dashboard 11074

Now let’s say you find this dashboard too complicated, so you’d like to try a simpler one.

Go back to https://grafana.com/grafana/dashboards. Again, under “Filter by”, enter “node exporter” and you will get a large number of matches.

Near the top you should find 1 Node Exporter for Prometheus Dashboard EN 20201010, and if you hover or click you should see it has ID “11074”

This seems pretty popular, so return to your grafana server and install this dashboard just as you did before.

A simpler dashboard will appear. Yay!

But on closer inspection, there are problems: for example, “Disk Space Used% Basic” is showing no graphs. It’s up to you to edit the query to see what’s going on. So, on the right of the panel title (“Disk Space Used% Basic”), click the three-dots and select “Edit” from the drop-down menu.

It turns out that the author of this dashboard has decided to limit filesystems to just these two types:

fstype=~"ext.*|xfs"

But the classroom is using zfs, so you need to change this in all queries to

fstype=~"ext.*|xfs|zfs"

Note that there are 2 sets of queries (A and B) and several places to add in “|zfs” to the fstype label in each query box.

Now we’ve added zfs, let’s change the panel title. On the right-hand side, under Panel options, edit the title so it says:

【$show_hostname】:Disk Space Used% (EXT/XFS/ZFS)

Once done be sure to click on Apply to save your changes and click on the floppy disk icon, upper right, to save the updated dashboard as well. See if this has fixed your graph. If so, you can save your changes using the Save icon, which is a small floppy-disk towards the top of the screen, to the left of “Share”.

Hopefully, this exercise has given a small flavour of the sort of problems you might find when importing other people’s dashboards. They might have been written with older versions of Grafana, and they might make assumptions about your environment which aren’t true. They are a useful starting point, but may need adjusting or fixing.

Creating dashboards from scratch

We’ll now look at how you could construct your own dashboard from scratch, using snmp_exporter as an example.

Create a basic SNMP panel

Go to Dashboards, New > New Dashboard. You will see “Start your new dashboard by adding a visualization” and some other text.

It will say “Select data source”: choose Prometheus.

Now you will get the panel editor, which displays an empty graph (“No data”). Below it are tabs for “Query / Transform data / Alert”, and to the right the panel settings.

At the top just above the graph, there is a query duration which probably says “Last 6 hours”. Change this to “Last 15 minutes” so that it is easier to see recent data.

Under “Query”, there is a block with heading “A (Prometheus)”. This is your first query. It has options “Kick start your query”, “Explain”, “Run queries”, and then “Builder | Code”. Click on “Code” - this lets you enter raw PromQL queries.

You should then see a section “Metrics browser >”. In this box, enter rate(ifHCInOctets[$__rate_interval])*8 and then click “Run queries”. If you are collecting SNMP data, then some graphs should be drawn.

Note: the variable $__rate_interval gives the appropriate interval to calculate a prometheus rate() between two adjacent graph points, taking into account the current zoom level. It was introduced in Grafana 7.2.

Below this, expand the section labelled “Options”. Click on Legend, change it to Custom, which will give you a text field. In this field enter {{instance}} {{ifDescr}} in. This should simplify the labels on the legend. (Click outside the field for the change to take effect).

At the bottom, click the “+ Add query” button. This will give you a query “B”. Enter rate(ifHCOutOctets[$__rate_interval])*8 in the Metrics browser field, and click Run queries.

Again, select Options, Legend, Custom. Enter {{instance}} {{ifDescr}} out

Hover your mouse over one of the data lines in the graph: it should show the time and value of the timeseries. (If it doesn’t, click in any unused area of the left-hand panel, and try again).

Now move your attention to the right-hand area of the screen.

You will note that the type of graph selected says “Time series”. This is the type of visualization to use. Clicking the down arrow shows many more options (e.g. “Bar Chart”, “Stat”, “Gauge”) but we won’t be using these. Click on the up arrow to hide them again.

Below this you should see a section “Panel options”. It may already be expanded - if not, click on it to expand. In the “Title” field write “Traffic In/Out” and click outside. You should see the title of the panel change.

Now move to the section entitled “Graph styles”, again expand it if necessary. Under “Line interpolation” are icons for four options: Linear, Smooth, Step before, Step after. Try clicking “Step before” and notice the difference in the graph. You can keep this, or go back to Linear or Smooth, whichever you prefer. You can also try setting “Show points” to “Never”.

In this same section, try increasing the “Fill opacity” to 10 (percent). This can be quite a pleasing effect.

Now what we’re going to do is to make the “in” and “out” graphs appear above and below the central axis.

Scroll down to the very bottom and click “+ Add field override”, then select “Fields returned by query”. Under “Fields returned by query”, select “Query: B”.

Below this click “+ Add override property”. Select “Graph Styles > Transform”. Under “Graph Styles > Transform”, select “Negative-Y”.

What you should now see is that the inbound traffic is shown above the line, and outbound traffic below the line. You’ve told it to do this for every timeseries which was returned from the “B” query.

Next we’re going to sort out the graph labeling.

Scroll back up to “Standard options” and expand it if necessary. Under “Unit”, click “Data rate” and then select “bits/sec(SI)”. The graph Y axis should change to showing kb/s or Mb/s.

Next, scroll up to the section “Legend”. Set Legend Mode to Table, and Legend Placement to Right. Notice the difference in the graph.

In the top-right corner of the screen, click “Apply” to finalise the settings for this panel.

You can stretch the panel to fit the screen width, by dragging on its bottom right-hand corner. When you are happy, click the floppy disk (Save dashboard) icon: enter “SNMP Traffic” as the dashboard name, and click the blue Save button.

Note: if you want to change the name of your dashboard after the first save, then click on the cog at the very top of the screen (next to the floppy disk icon). And if you want to re-edit your panel, click to the right of the panel title (i.e. “Traffic In/Out”), where you’ll find a drop-down menu where you can select “Edit”

Filtering with Variables

We now have a workable graph. The problem is that it includes all interfaces across all devices. Not only is this cluttered, but it will become unworkably slow once you have many interfaces.

We need to filter it down, and this is done using variables.

Click on the top cog (Dashboard Settings, next to the floppy disk), then select the Variables tab.

Click “Add Variable”

Click thh “Close” button at the top-right, this will show your panel again. Notice there is now an “instance” drop-down at the top, which lets you select between instances. However, it doesn’t actually have any effect on the graph yet!

Click on the panel title, where it says “Traffic In/Out”. Click the three-dots menu on the right hand side, and select “Edit”. This brings you back into the editor.

Change your queries in the Metrics browser field for A and B: you are going to insert the string {instance="$instance"} after each metric name, so the queries now look like this:

rate(ifHCInOctets{instance="$instance"}[$__rate_interval])*8

rate(ifHCOutOctets{instance="$instance"}[$__rate_interval])*8

Now try selecting devices on the “instance” drop-down at the top left of the screen, and clicking “Run queries” after changing. You’ll be switching between different devices; only the interfaces from the selected device will be shown.

Since only one device is being shown at once, you can change the Legend for Query A and B so that it only shows the interface description:

{{ifDescr}} in

{{ifDescr}} out

Click Apply, then save the dashboard - add a comment about your changes, such as “Filter by instance”

This is much better, but what if we only want to see one interface at a time? Then we add another variable.

You should now have two variables:

Variable Definition
instance label_values(ifIndex, instance)
ifDescr label_values(ifIndex{instance="$instance"},ifDescr)

Return to the panel editor (click Close at top; if necessary select Edit from the three-dots menu on the panel title)

You are now going to change the queryes in the Metrics browser again, for both Query A and B, to include ,ifDescr=~"[[ifDescr]]" as part of the label matching expression. They should now look like this:

rate(ifHCInOctets{instance="$instance",ifDescr=~"[[ifDescr]]"}[$__rate_interval])*8

rate(ifHCOutOctets{instance="$instance",ifDescr=~"[[ifDescr]]"}[$__rate_interval])*8

Note that because we defined ifDescr to be a “multi-valued” field, we must use regex compare (=~) instead of equality (=). We could have used $ifDescr to identify the variable, but the alternative form [[ifDescr]] avoids confusion with the special meaning of $ in a regex.

Now on the interface drop-down, you can select any single interface, combination of interfaces, or “All”, and get the graphs overlaid. Click “Apply” at the top-right when you’re happy with this.

Click “Save Dashboard” (the floppy disk at the top) to keep your changes. Again, you can optionally enter a note about what you changed, e.g. “Added interface selection”.

Note: this approach relies on all your interfaces having a unique “ifDescr”. If this isn’t true, you can use “ifIndex” instead, but then the drop-down menu will show the less friendly index number instead.

Adding single stats

For a front-page dashboard, it can be useful to show some headline stats such as the number of hosts which are up or down.

To add a new panel to your dashboard, click “Add > Visualization” from the top row (to the right of the floppy disk and cog).

In the query (next to “Metrics browser”), enter

count(up{job="snmp"})

and click “Run queries”. This gives the number of devices being polled via SNMP. (Note: this counts both up and down devices; the value could by 1 or 0)

Open up the “Options” section under the query. Change the type from “Range” to “Instant”. This turns it into a single point in time query, rather than sweeping the entire selected graph range (e.g. “Last 15 minutes”)

At the top right, click on the currently-selected visualization which is “Time series”, and change it to “Stat”.

On the right, under the “Settings” section change the Title to “Number of SNMP devices”.

Under “Stat styles” set “Graph mode” to “None”. (This panel is able to display a graph of how the value has changed over the time period, but we decided only to query a single instant)

Click “Apply” at top right. Shrink your panel to make it a sensible size, by dragging on the bottom right corner.

On your new panel, next to the title, click the three-dots and select “More… Duplicate” to get a copy of this panel. Edit it (three-dots, edit), change the query to

count(ifOperStatus != 1)

and change the title to “SNMP interfaces down”.

On the right side of the screen go down to the section “Thresholds”, and change the threshold for red to 1. This means that the value “0” will display in green, and anything else in red. Then click “Apply” at top right.

Optional: try changing the visualization to “Gauge”, and under “Standard Options” set Min 0, Max 10 (or larger if needed), so you get a speedometer-like display.

Adding tables

Let’s get a table of MAC addresses of our interfaces. The information is buried in the labels of “ifPhysAddress”, one time series per interface. Here is an example of the kind of metric returned:

ifPhysAddress{ifDescr="lxdbr0",ifIndex="6",ifName="lxdbr0",ifPhysAddress="FE:19:86:D7:1D:AE",instance="gw.ws.nsrc.org",job="snmp",module="if_mib1"} 1

Click “Apply” at top right. Move your interfaces table in the dashboard and resize until you like how it looks. Note that by clicking on the column headings you can sort by different columns. You can also drag the whole panel to a new location.

Optional: if you edit the panel, go to the visualization settings under “Table” and turn on “Column filter”, then you can also enable Excel-style filtering of values.

Remember to save your changes by clicking on the floppy disk icon in the upper right and adding a comment if you wish.

Use an existing dashboard?

Unfortunately, there seem to be very few dashboards published for snmp_exporter and prometheus. You could try importing 12489 and 12492: these are simple dashboards put together by NSRC for this workshop. One shows the traffic on all interfaces overlaid on a single graph, and one shows more detailed stats for a single interface.

If you make a better one, please publish it for us to use!