Nagios lab parts 1 to 3

Nagios Installation and Configuration

Introduction

Goals

Notes

Exercises

PART I

Log in to your virtual machine as the sysadm user.

Nagios Installation has already been done. We do these steps for practice

Install Nagios Version 4

$ sudo apt install nagios4

Nagios web administration password: Userid: nagiosadmin

$ sudo apt install --reinstall iputils-ping

At this point you will have a web server installed on your host, but you may need to start it. To enable and start your web server do:

$ sudo systemctl enable apache2
$ sudo systemctl start apache2

Let’s see that apache2 is running as we expect:

$ sudo systemctl status apache2

Press ‘q’ to quit if the output fills the page.

You may need to configure Nagios to start whenever your host starts and then you will need to start the service. To do this do:

$ sudo systemctl enable nagios4
$ sudo systemctl start nagios4

Let’s verify that nagios is now running as we expect:

$ sudo systemctl status nagios4

Press ‘q’ to quit if the output fills the page.

See Initial Nagios Configuration

Open a browser, and go to your machine like this:

http://oob-hostX-campusY.ws.nsrc.org/nagios4/

At the login prompt, login as:

        User Name: nagiosadmin
        Password:  <CLASS PASSWORD>

Click on the “Hosts” link on the left of the initial Nagios page to see what has already been configured.

Click on the “Services” link to see what local services are being monitored1.

Add Routers, PCs and Switches

We will create three files, routers.cfg, switches.cfg and pcs.cfg and make entries for the devices in your campus. If you want, you can simply create a single file for all items - Nagios will read any file named *.cfg and sort out the details on its own.

Creating the “routers.cfg” file

If you want some help to understand what your campus network looks like take a look at the detailed network diagram for campus1 linked on the main page for your workshop.

For each group you will end up monitoring each item in your campus, this includes:

Routers

Switches

Hosts

$ cd /etc/nagios4/conf.d
$ sudo editor routers.cfg

’‘’NOTE:’’’ Y is the number of your campus (1, 2, 3, 4, 5, or 6)

define host {
    use         generic-switch
    host_name   transit1-nren
    alias       Campus Y Transit Provider Router
    address     transit1-nren.ws.nsrc.org
}

define host {
    use         generic-switch
    host_name   bdr1-campusY
    alias       Campus Y Border Router
    address     bdr1-campusY.ws.nsrc.org
}

define host {
    use         generic-switch
    host_name   core1-campusY
    alias       Core Router 1, Campus Y
    address     core1-campusY.ws.nsrc.org
}

Now save the file and exit the editor.

Let’s verify that our changes are working. On the command line do:

$ sudo nagios4 -v /etc/nagios4/nagios.cfg

If you don’t have any errors (warnings are OK), then you can reload the Nagios configuration:

$ sudo systemctl reload nagios4

And, in a web browser view:

http://oob-hostX-campusY.ws.nsrc.org/nagios4/

and click on hosts. You should now see your routers listed. They may still be waiting to be checked. Eventually they should turn green once Nagios runs a check.

Now we will do the same steps for our campus switches.

Creating the switches.cfg file

$ cd /etc/nagios4/conf.d                                (just to be sure)
$ sudo editor switches.cfg

In this file add the following entries. You can COPY and PASTE, but be sure to update “Y” with your campus number:

define host {
    use         generic-switch
    host_name   dist1-b1-campusY
    alias       Distribution Switch 1, Building 1, Campus Y
    address     dist1-b1-campusY.ws.nsrc.org
}

define host {
    use         generic-switch
    host_name   dist1-b2-campusY
    alias       Distribution Switch 1, Building 2, Campus Y
    address     dist1-b2-campusY.ws.nsrc.org
}

Save the file and exit.

Creating the pcs.cfg File

Now we create entries for the 6 hosts (host1 through host6) and the campus shared server (srv1).

$ sudo editor pcs.cfg

For each campus place this entry at the top of the pcs.cfg file (replace “Y” with your campus number):

define host {
    use         linux-server
    host_name   srv1-campusY
    alias       Server 1, Campus Y
    address     srv1-campusY.ws.nsrc.org
}

(Note: the DNS name resolves to both IPv4 and IPv6 addresses, and Nagios will use IPv6 by default. You could instead use a literal IP address like 100.68.Y.X or 2001:db8:Y:1::X)

Here is a sample entry for host1. You can continue with the remaining hosts using this example.

define host {
    use         linux-server
    host_name   host1-campusY
    alias       Host 1, Campus Y
    address     host1-campusY.ws.nsrc.org
}

Now repeat this for all your remaining hosts. You should have a pcs.cfg file with one entry for srv1-campusY defined and your other 6 hosts, or a total of 7 entries.

REPEAT THE NEXT THREE STEPS EACH TIME YOU MAKE CHANGES TO YOUR CONFIGURATION

Verify that your configuration files are OK

$ sudo nagios4 -v /etc/nagios4/nagios.cfg

You should see some warnings like the ones below. You can ignore them for now.

Checking objects...
        Checked 8 services.
        Checked 13 hosts.
        Checked 1 host groups.
        Checked 0 service groups.
        Checked 1 contacts.
        Checked 1 contact groups.
        Checked 212 commands.
        Checked 5 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 13 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Nagios is saying that it’s unusual to monitor a device just for its existence on the network, without also monitoring some service.

Reload/Restart Nagios

$ sudo systemctl reload nagios4

HINT: You will be doing this a lot. If you do it all on one line, like this, then you can use arrow-up and call back the command:

$ sudo nagios4 -v /etc/nagios4/nagios.cfg && sudo systemctl reload nagios4

The ‘&&’ ensures that the reload only happens if the config is valid.

Verify via the Web Interface

Go to the web interface (http://oob-hostX-campusY.ws.nsrc.org/nagios4) and check that the hosts you just added are now visible in the interface. Click on the “Hosts” item on the left of the Nagios screen to see this. You may see it in “PENDING” status until the check is carried out.

View Status Map

Go to http://oob-hostX-campusY.ws.nsrc.org/nagios4/

Click on the “Map” item on the left. You should see all your hosts with the Nagios process in the middle. The “Map (Legacy)” view is an alternative, older view; if you want to try this you’ll also have to select Layout Method > Circular (Marked Up) then click Update.

PART II - Configure Service Check for the Campus srv1 Server

Configuring

Now that we have our hardware configured we can start telling Nagios what services to monitor on the configured hardware, how to group the hardware in interesting ways, how to group services, etc.

Associate a service check for your campus server

$ cd /etc/nagios4/conf.d                                (just to be sure)
$ sudo editor services.cfg

Now we will create a service check for SSH.

In the services.cfg file add:

define service {
        hostgroup_name                  ssh-servers
        service_description             SSH
        check_command                   check_ssh
        use                             generic-service
        # ssh occasionally flaps. Let's be sure it's down.
        retry_interval                  2
        max_check_attempts              10
        notification_interval           0 ; set > 0 if you want to be renotified
}

Save and exit.

Next, we will create a hostgroup which lists all machines that are running ssh:

$ sudo editor hostgroups.cfg

We’ll start with something simple. Let’s define that localhost and host srv1-campusY run ssh. In the hostgroups.cfg file, write:

define hostgroup
  hostgroup_name  ssh-servers
  alias           SSH Servers
  members         localhost,srv1-campus6
}

Verify that your changes are OK and apply them:

$ sudo nagios4 -v /etc/nagios4/nagios.cfg && sudo systemctl restart nagios4

In the Nagios web interface, find the “Services” link (left menu), and click on it. You should see that both “localhost” and “srv1-campusY” have an SSH service check visible.

PART III - Defining Services for all Servers

Determine what services to define for what devices

To start we are simply using ping to verify that our servers and network devices are responding or “Up”. So far we are only monitoring ssh on your campus server.

Now let’s add monitoring of services for our various servers and network devices:

In this class we, so far, have:

The classroom NOC is currently running an snmp daemon we can monitor if you wish.

Verify that SSH is running on the routers and workshop PCs images

We already have the ssh service check configured, so we just need to add more machines in the /etc/nagios4/conf.d/hostgroups.cfg file. It currently contains:

define hostgroup
  hostgroup_name  ssh-servers
  alias           SSH Servers
  members         localhost,srv1-campusY
}

Now update with your remaining campus items as well as the transit router. We will give one complete example below:

define hostgroup {
  hostgroup_name  ssh-servers
  alias           SSH Servers
  members         srv1-campusY, host1-campusY, host2-campusY, host3-campusY, \
                  host4-campusY, host5-campusY, host6-campusY, transit1-nren, \
                  bdr1-campusY, core1-campusY, dist1-b1-campusY, dist1-b2-campusY
}

NOTES:

We have removed localhost from the entry above. We will do the same for our remaining hostgroups.

The “members” entry will be a long line and will likely wrap on the screen. If you want to start additional entries on newline then use "" to indicate a newline as shown.

Only include entries you have defined previously. So, include all servers, routers and switches you have configured previously.

Be sure you change “Y” to your campus number.

Once you are done, run the pre-flight check and reload Nagios:

$ sudo nagios4 -v /etc/nagios4/nagios.cfg && sudo systemctl reload nagios4

… and view your changes in the Nagios web interface.

To continue with hostgroups you can add additional groups for later use, such as all your
campus routers. Go ahead and edit the file hostgroups.cfg again:

$ cd /etc/nagios4/conf.d                                (just to be sure)
$ sudo editor hostgroups.cfg

and add the following to the end of the file (COPY and PASTE this, and edit appropriately):

# A list of our virtual routers

define hostgroup {
        hostgroup_name  routers
        alias           Cisco Routers and Switches for CampusY
        members         bdr1-campusY,core1-campusY,transit1-nren
}

Save and exit from the file. Verify that everything is OK:

$ sudo nagios4 -v /etc/nagios4/nagios.cfg

If everything looks good, then reload the Nagios configuration

$ sudo systemctl reload nagios4

Check that http is running on all your campus servers

Much like the ssh-servers hostgroup check we will create a check for http running across all of your hosts that run http in your campus. This will include:

Hosts

Edit the file hostgroups.cfg:

$ cd /etc/nagios4/conf.d                                (just to be sure)
$ sudo editor hostgroups.cfg

And add this entry:

# A list of your web servers
define hostgroup {
        hostgroup_name  http-servers
        alias           HTTP servers
        members         host1-campusY, host2-campusY, host3-campusY, \
                        host4-campusY, host5-campusY, host6-campusY, srv1-campusY
}

Then edit services.cfg and add this entry:

define service {
        hostgroup_name                  http-servers
        service_description             HTTP
        check_command                   check_http
        use                             generic-service
        max_check_attempts              10
        notification_interval           0 ; set > 0 if you want to be renotified
}

If you have questions or are confused please ask an instructor for help.

When you are done making the changes, save your files and check that everything is OK:

$ sudo nagios4 -v /etc/nagios4/nagios.cfg

If everything looks good, then reload Nagios

$ sudo systemctl reload nagios4

Now go to the Nagios web interface and click on the Services menu choice on the left of the page.

You are ready to go on to the next set of exercises.


  1. If you see error “DISK CRITICAL - /sys/kernel/debug/tracing is not accessible: Permission denied” then you have come across a minor bug.

    It is safe to ignore this, but if you want to fix it you would edit /etc/nagios-plugins/config/disk.cfg and change the command_line of the check_all_disks command definition like this:

    # 'check_all_disks' command definition
    define command{
            command_name    check_all_disks
            command_line    /usr/lib/nagios/plugins/check_disk -w '$ARG1$' -c '$ARG2$' -e -X tracefs -X cgroup -X tmpfs
            }
    ↩︎