Nagios Exercises
Network Monitoring and Management Workshop

PART I
-----------------------------------------------------------------------------

1. Install Nagios version 3

    # apt-get install nagios3

2. Create the Web user password file:

    # htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin

New password:         
Re-type new password: 

   We suggest you use your standard user password used in class.


2. You should already have a working Nagios!

    - Open a browser, and go to

    http://localhost/nagios3/

    - At the login prompt, login as:

        user: nagiosadmin
        pass: 

3. Let's look at the interface together...

    # cd /etc/nagios3/

    # ls -l 
    -rw-r--r-- 1 root root    1882 2008-12-18 13:42 apache2.conf
    -rw-r--r-- 1 root root   10524 2008-12-18 13:44 cgi.cfg
    -rw-r--r-- 1 root root    2429 2008-12-18 13:44 commands.cfg
    drwxr-xr-x 2 root root    4096 2009-02-14 12:33 conf.d
    -rw-r--r-- 1 root root      26 2009-02-14 12:36 htpasswd.users
    -rw-r--r-- 1 root root   42539 2008-12-18 13:44 nagios.cfg
    -rw-r----- 1 root nagios  1293 2008-12-18 13:42 resource.cfg
    drwxr-xr-x 2 root root    4096 2009-02-14 12:32 stylesheets
    
    # ls -l conf.d/

    -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
    -rw-r--r-- 1 root root  418 2008-12-18 13:42 extinfo_nagios2.cfg
    -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
    -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
    -rw-r--r-- 1 root root  210 2009-02-14 12:33 host-gateway_nagios3.cfg
    -rw-r--r-- 1 root root  976 2008-12-18 13:42 hostgroups_nagios2.cfg
    -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
    -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
    -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg

    Notice that the package does not have renamed filenames for the conf.d 
    directory - they are the same files as used for the Nagios version 2
    Ubuntu package. There was an update made to the host-gateway configuration
    file so this has been renamed.

PART II
Configuring Equipment
-----------------------------------------------------------------------------

1. According to what we saw in class, let's add a new host

    - Pick any PC in the room.

    # cd /etc/nagios3/conf.d/

    # vi pcX.cfg        (Where X is some number)

define host {
    use         generic-host
    host_name   pcX
    alias       PC X at Network Design Workshop
    address     _______________       [pcX's IP address here]
}

    ... Save and quit

2. Let's create a new hostgroup for the occasion, and add our host
   to it

    - Edit the file hostgroups_nagios2.cfg and add a new group:

    # vi hostgroups_nagios2.cfg

define hostgroup {
    hostgroup_name  servers
    alias           Network Design PCs
    members         pcX
}

3. Now let's associate some services to that host

    # vi services_nagios2.cfg

    - Find the section called "check that ssh services are running",
      and change the line:

hostgroup_name                  ssh-servers

    to

hostgroup_name                  ssh-servers, servers



4. Verify that your configuration file is OK:

    # nagios3 -v /etc/nagios3/nagios.cfg 

    ... You should get :

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the check.


5. Reload/Restart Nagios

    # /etc/init.d/nagios3 restart

6. Go to the web interface (http://localhost/nagios3) and check the host
   you just added


7. Add ALL the PCs in the classroom

    - Remember to verify the configuration file!

    - I suggest that you create a single config file called pcs.cfg
      to do this.

    - You will repeat steps 1, 2 and 3 from above. When you edit the 
      file hostgroups_nagios2.cfg to update the members of the servers
      group the format of the members statement is:

      members       pcX,pcY,pcZ,...

    - If you do not know the names of all the PCs in the classroom or
      their IP addresses refer to the classroom Network Diagram either
      available in the classroom, or on the class web site:

      http://nsrc.org/workshops/2010/apricot/

      *Also available for now at http://noc/diagram

8. Add the routers and switches in your classroom

    - Create files called "routers.cfg" and "switches.cfg" in 
      /etc/nagios3/conf.d
    
    - In the routers file you need to add 4 entries. Here is the initial
      entry for the gateway router for the classroom:

define host {
    use         generic-host
    host_name   bb-gw
    alias       gw router
    address     169.223.142.1
}

      add in entries for the other three routers.

    - There are four switches. Do the same in the switches.cfg file.

    - Remember to look at the network diagram if you do not know their
      names or IP addresses.

    - Use the Nagios "pre-flight" check to verify that your configuration 
      is correct:

    # nagios3 -v /etc/nagios3/nagios.cfg

    - You may see some errors as there are no services defined for these
      new entries. This is OK and we will be taking care of this later.
    
9. Reload/Restart Nagios

    # /etc/init.d/nagios3 restart

    - Take a look at http://localhost/nagios3 to see your changes.

    - Click on the "Status Map" link to see how things look.

PART III
Defining Parents
-----------------------------------------------------------------------------

1. Define parents for your hardware devices

   - Remember that Nagios is smart about what to check based on the state of
     your network. This "smartness" is largely driven by the concept of 
     parent relationships. Each device in our network (except for the classroom
     gateway router) has a parent device. You need to define what that device is
     for each pc, router and switch in the files pcs.cfg, switches.cfg and 
     routers.cfg.

   - This is extremely simple. To get you started here is an updated entry
     for pcX who has a parent of switchY in the file pcs.cfg:

define host {
    use         generic-host
    host_name   pcX
    alias       PC X at Network Design Workshop
    address     _______________       [pcX's IP address here]
    parents    switchY
}

   - Note, use the hostname, not the IP address for parents entries. 

   - Repeat this process for all the devices you have defined. If you do not know
     the name of the parent device, or are confused about the network layout for
     the classroom remember to use the network diagram:

     http://noc/diagram

   - Once you are done be sure to do:

   # nagios3 -v /etc/nagios3/nagios.cfg
   
     to check on the status of your work.

2. Restart Nagios and review the Status Map

   # /etc/init.d/nagios3 restart

   - Now click on the Status Map link again. It should look quite different! 

PART IV
Defining Services
-----------------------------------------------------------------------------

1. Determine what services to define for what devices

   - This is core to how you use Nagios and network monitoring tools in 
     general. So far we are simply using ping to verify that physical hosts
     are up on our network. The next step is to decide what services you wish
     to monitor for each host.

   - In this particular class we have:

     routers:  4 running ssh 
     switches: 3 run ssh and telnet, 1 runs just telnet
     pcs:      All pcs are running ssh and http
               All student pcs (15 of them) are running snmp
              
     So, let's configure Nagios to check for all of these services for these 
     devices.

2. Check that telnet is running on the workshop switches.

   - You will need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg
     to first define the "check_telnet" and to what group of hosts this 
     command will apply.

   - Edit the file services_nagios2.cfg:

   # vi /etc/nagios3/conf.d/services_nagios2.cfg

   At the bottom of the file add in the new service definition. It will look
   like this:

# check that telnet is running
define service {
        hostgroup_name                  telnet-servers
        service_description             Telnet
        check_command                   check_telnet
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

   - By default Nagios (on Ubuntu) is pre-configured with web, ssh and ping
     service definition. It turns out, once we are completely done, that you 
     may not need the ping service definition - but, don't remove it yet!

   - Notice the parameter that says:

     hostgroup_name                    telnet-servers

     We need to create this before we try to restart Nagios. Edit the file
     /etc/nagios3/conf.d/hostgroups_nagios2.cfg and at the bottom of the
     file add the following entry:

# A list of your telnet-accessible devices (older switches)
define hostgroup {
        hostgroup_name  telnet-servers
                alias           Telnet servers
                members         bb-sw,pc1-5-sw,pc6-10-sw,pc11-15-sw
        }

     Note the "members" section. Hopefully when you defined your switches
     in the switches.cfg file this is what you used for the host_name directive
     for the switches.

   - Save your charges and check your configuration:

   # nagios3 -v /etc/nagios3/nagios.cfg

   - Restart Nagios and see if you notice the changes you've made. Note that 
     the actual check of the telnet service will most likely be in a "pending"
     state at first.

3.) Verify that SSH is running on the routers and workshop PCs

   - In the file services_nagios2.cfg there is already an entry for the SSH 
     service check, so you do not need to create this step. Instead, you 
     simply need to re-define the "ssh-servers" entry in the file
     /etc/nagios3/conf./hostgroups_nagios2.cfg. The initial entry in the file
     look like:

# A list of your ssh-accessible servers
define hostgroup {
        hostgroup_name  ssh-servers
                alias           SSH servers
                members         localhost
        }

     What do you think you should change? Correct, the "members" line. You should
     remove "localhost" and add in entries for all the classroom pcs, routers and 
     the three switches that run ssh. The one switch that does not run ssh
     is "bb-sw"... With this information and the network diagram you should be able
     complete this entry:

    - Once you are done, run the pre-flight check:

    # nagios3 -v /etc/nagios3/nagios.cfg

    If everything looks good, then restart Nagios and see your changes in the 
    Nagios web interface.

4.) Check that http is running on all the workshop PCs. 

    - Like ssh, there is already a check_http service defined and it automatically
      applies to the http-servers group. (Note, you can add additional groups of hosts
      for any service check if you wish). So, you need to update the "http-servers" entry
      in the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg to include all the workshop
      PCs running http (i.e. Apache Web Server).

    - See the previous exercise and make the appropriate change to do this. If you have
      any questions ask your instructor for help.

5.) Check that SNMP is running on the classroom PCs. 

    - First you will need to add in the appropriate service check for SNMP in the file
      /etc/nagios3/conf.d/services_nagios2.cfg. This is where Nagios is impressive. There
      are hundreds, if not thousands, of service checks available via the various Nagios
      sites on the web. You can see what plugins are installed by Ubuntu in the nagios3
      package that we've installed by looking in the following directory:

    # ls /usr/lib/nagios/plugins

      As you'll see there is already a check_snmp plugin available to us. If you are 
      interested in the options the plugin takes you can execute the plugin from the
      command line by typing:

    # /usr/lib/nagios/plugins/check_snmp

      to see what options are available, etc. You can use the check_snmp plugin and
      Nagios to create very complex or specific system checks.

    - Now to see all the various service/host checks that have been created using the
      check_snmp plugin you can look in /etc/nagios-plugins/config/snmp.cfg. You will
      see that there are a lot of preconfigured checks using snmp, including:

      snmp_load
      snmp_cpustats
      snmp_procname
      snmp_disk
      snmp_mem
      snmp_swap
      snmp_procs
      snmp_users
      snmp_mem2
      snmp_swap2
      snmp_mem3
      snmp_swap3
      snmp_disk2
      snmp_tcpopen
      snmp_tcpstats
      snmp_bgpstate
      check_netapp_uptime
      check_netapp_cupuload
      check_netapp_numdisks
      check_compaq_thermalCondition
      
      And, even better, you can create additional service checks quite easily.
      For the case of verifying that snmpd (the SNMP service on Linux) is running we
      need to ask SNMP a question. If we don't get an answer, then Nagios can assume
      that the SNMP service is down on that host. When you use service checks such as
      check_http, check_ssh and check_telnet this is what they are doing as well.

    - In our case, let's create a new service check and call it "check_system". This
      service check will connect with the specified host, use the private community 
      string we have defined in class and ask a question of snmp on that ask - in this
      case we'll ask about the System Description, or the OID "sysDescr.0" -

    - To do this start by editing the file /etc/nagios-plugins/config/snmp.cfg:

    # vi /etc/nagios-plugins/config/snmp.cfg

      At the top (or the bottom, your choice) add the following entry to the file:

# īcheck_system_ command definition
define command{
       command_name    check_system
       command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -C
'$ARG1$' -o sysDescr.0
        }

      Note that "command_line" is a single line.

    - Now you need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg and add
      in this service check. We'll run this check against all our servers in the 
      classroom, or the hostgroup "debian-servers"

    - Edit the file /etc/nagios3/conf.d/services_nagios2.cfg

    # vi /etc/nagios3/conf.d/services_nagios2.cfg

      At the bottom of the file add the following definition:

# check that snmp is up on all servers
define service {
        hostgroup_name                  debian-servers
        service_description             SNMP
        check_command                   check_system!s3cr3t
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

      Note that we have included our private community string here vs. hard-coding
      it in the snmp.cfg file earlier. 

    - Now verify that your changes are correct and restart Nagios.

    - If you click on the Service Detail menu choice in web interface you should see
      the the SNMP check appear.

PART V
Create More Host Groups
-----------------------------------------------------------------------------

1. Update /etc/nagios3/conf.d/hostgroups_nagios2.cfg

    - For the following exercises it will be very useful if we have created
      or update the following hostgroups:

      debian-servers
      routers
      switches
 
      If you edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg you
      will see an entry for debian-servers that just contains localhost. 
      Update this entry to include all the classroom PCs, including the
      noc (this assumes that you created a "noc" entry in your pcs.cfg
      file).

    # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg

     Update the entry that says:


# A list of your Debian GNU/Linux servers
define hostgroup {
        hostgroup_name  debian-servers
                alias           Debian GNU/Linux Servers
                members         localhost
        }
      
      So that the "members" parameter contains:

                members         noc,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,
                                pc11,pc12,pc13,pc14,pc15

      - Once you have done this, add in two more entries. One for routers and 
        one for switches. Call these entries "routers" and "switches".

      - When you are done be sure to verify your work and restart Nagios.
    

PART V
Extended Host Information ("making your graphs pretty")
-----------------------------------------------------------------------------

1. Update extinfo_nagios2.cfg 

    - If you would like to use appropriate icons for your defined hosts in
      Nagios this is where you do this. We have the three types of devices:

      Cisco routers
      Cisco switches
      Ubuntu servers

      There is a fairly large repository of icon images available for you to
      use located here:

      /usr/share/nagios/htdocs/images/logos/

      these were installed by default as dependent packages of the nagios3
      package in Ubuntu. In some cases you can find model-specific icons for
      your hardware, but to make things simpler we will use the following 
      icons for our hardware:

      /usr/share/nagios/htodcs/images/logos/base/debian.*
      /usr/share/nagios/htdocs/images/logos/cook/router.*
      /usr/share/nagios/htdocs/images/logos/cook/switch.*

    - The next step is to edit the file /etc/nagios3/conf.d/extinfo_nagios2.cfg
      and tell nagios what image you would like to use to represent your devices.

    # vi /etc/nagios3/conf.d/extinfo_nagios2.cfg

      Here is what an entry for your routers looks like (there is already
      an entry for debian-servers that will work as is).

define hostextinfo {
        hostgroup_name   routers 
        icon_image       cook/router.png
        icon_image_alt   Cisco Routers (2811) 
        vrml_image       router.png 
        statusmap_image  cook/router.gd2
}

      Now add an entry for your switches. Once you are done check your
      work and restart Nagios. Take a look at the Status Map in the web interface.
      It should be much nicer.      

PART VI
Create Service Groups
-----------------------------------------------------------------------------

1. Create service groups for ssh and http for each set of pcs.

   - The idea here is to create three service groups. Each service group will
     be for the group of PCs that are connected to the routers pc1-5-gw,
     pc6-10-gw and pc11-15-gw. We want to see these PCs grouped together
     and include status of their ssh and http services. To do this edit
     and create the file:

   # vi /etc/nagios3/conf.d/servicegroups.cfg

     Here is a sample of the service group for the router pc1-5-gw:

define servicegroup{
        servicegroup_name       group 1 services
        alias                   pcs 1-5
        members                 pc1,SSH,pc1,HTTP,pc2,SSH,pc2,HTTP,pc3,SSH,
                                pc3,HTTP,pc4,SSH,pc4,HTTP,pc5,SSH,pc5,HTTP
        }

      Add in groups for pcs 6-10 and for pcs11-15. You can call these service
      groups anything you want.

    - Save your changes, verify your work and restart Nagios. Now if you click on
      the Servicegroup menu items in the Nagios web interface you should see
      this information grouped together. 


PART VII
Configure Guest Access to the Nagios Web Interface
-----------------------------------------------------------------------------

1. Edit /etc/nagios3/cgi.cfg to give r/o guest access.

    - By default Nagios is configured to give full r/w access via the Nagios
      web interface to the user nagiosadmin. You can change the name of this
      user, add other users, change how you authenticate users, what users
      have access to what resources and more via the cgi.cfg file.

    - First, lets create a "guest" user and password in the htpasswd.users
      file.

    # htpasswd /etc/nagios3/htpasswd.users guest

      You can use any password you want (or none). A password of "guest" is 
      not a bad choice.

    - Next, edit the file /etc/nagios3/cgi.cfg and look for what type
      of access has been given to the nagiosadmin user. By default
      you will see the following directives (note, there are comments between 
      each directive):

      authorized_for_system_information=nagiosadmin
      authorized_for_configuration_information=nagiosadmin
      authorized_for_system_commands=nagiosadmin
      authorized_for_all_services=nagiosadmin
      authorized_for_all_hosts=nagiosadmin
      authorized_for_all_service_commands=nagiosadmin
      authorized_for_all_host_commands=nagiosadmin

      Now lets tell Nagios to allow the "guest" user some access to 
      information via the web interface. You can choose whatever you would
      like, but what is pretty typical is this:

      authorized_for_system_information=nagiosadmin,guest
      authorized_for_configuration_information=nagiosadmin,guest
      authorized_for_system_commands=nagiosadmin
      authorized_for_all_services=nagiosadmin,guest
      authorized_for_all_hosts=nagiosadmin,guest
      authorized_for_all_service_commands=nagiosadmin
      authorized_for_all_host_commands=nagiosadmin

    - Once you make the changes, save the file cgi.cfg, verify your 
      work and restart Nagios. 

    - To see if you can log in as the "guest" user you may need to clear 
      the cookies in your web browser. You will not notice any difference
      in the web interface. The difference is that a number of items that
      are available via the web interface (forcing a service/host check, 
      scheduling checks, comments, etc.) will not work for the guest 
      user.

UPCOMING
New Commands, Updating Contact Information, Connecting Nagios to RT (tickets)
-----------------------------------------------------------------------------

During the ticket management sessions later int he week we will be working on
these items to allow Nagios to automatically create tickets in RT when certain
events take place.

Last update 24 Feb 2010 by HA