Nagios Exercises
Network Monitoring and Management Workshop
PART I
-----------------------------------------------------------------------------
1. Install Nagios version 3
# apt-get install nagios3
2. Create the Web user password file:
# htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin
New password:
Re-type new password:
We suggest you use your standard user password used in class.
2. You should already have a working Nagios!
- Open a browser, and go to
http://localhost/nagios3/
- At the login prompt, login as:
user: nagiosadmin
pass:
3. Let's look at the interface together...
# cd /etc/nagios3/
# ls -l
-rw-r--r-- 1 root root 1882 2008-12-18 13:42 apache2.conf
-rw-r--r-- 1 root root 10524 2008-12-18 13:44 cgi.cfg
-rw-r--r-- 1 root root 2429 2008-12-18 13:44 commands.cfg
drwxr-xr-x 2 root root 4096 2009-02-14 12:33 conf.d
-rw-r--r-- 1 root root 26 2009-02-14 12:36 htpasswd.users
-rw-r--r-- 1 root root 42539 2008-12-18 13:44 nagios.cfg
-rw-r----- 1 root nagios 1293 2008-12-18 13:42 resource.cfg
drwxr-xr-x 2 root root 4096 2009-02-14 12:32 stylesheets
# ls -l conf.d/
-rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
-rw-r--r-- 1 root root 418 2008-12-18 13:42 extinfo_nagios2.cfg
-rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
-rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
-rw-r--r-- 1 root root 210 2009-02-14 12:33 host-gateway_nagios3.cfg
-rw-r--r-- 1 root root 976 2008-12-18 13:42 hostgroups_nagios2.cfg
-rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
-rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
-rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg
Notice that the package does not have renamed filenames for the conf.d
directory - they are the same files as used for the Nagios version 2
Ubuntu package. There was an update made to the host-gateway configuration
file so this has been renamed.
PART II
Configuring Equipment
-----------------------------------------------------------------------------
1. According to what we saw in class, let's add a new host
- Pick any PC in the room.
# cd /etc/nagios3/conf.d/
# vi pcX.cfg (Where X is some number)
define host {
use generic-host
host_name pcX
alias PC X at Network Design Workshop
address _______________ [pcX's IP address here]
}
... Save and quit
2. Let's create a new hostgroup for the occasion, and add our host
to it
- Edit the file hostgroups_nagios2.cfg and add a new group:
# vi hostgroups_nagios2.cfg
define hostgroup {
hostgroup_name servers
alias Network Design PCs
members pcX
}
3. Now let's associate some services to that host
# vi services_nagios2.cfg
- Find the section called "check that ssh services are running",
and change the line:
hostgroup_name ssh-servers
to
hostgroup_name ssh-servers, servers
4. Verify that your configuration file is OK:
# nagios3 -v /etc/nagios3/nagios.cfg
... You should get :
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the check.
5. Reload/Restart Nagios
# /etc/init.d/nagios3 restart
6. Go to the web interface (http://localhost/nagios3) and check the host
you just added
7. Add ALL the PCs in the classroom
- Remember to verify the configuration file!
- I suggest that you create a single config file called pcs.cfg
to do this.
- You will repeat steps 1, 2 and 3 from above. When you edit the
file hostgroups_nagios2.cfg to update the members of the servers
group the format of the members statement is:
members pcX,pcY,pcZ,...
- If you do not know the names of all the PCs in the classroom or
their IP addresses refer to the classroom Network Diagram either
available in the classroom, or on the class web site:
http://nsrc.org/workshops/2010/apricot/
*Also available for now at http://noc/diagram
8. Add the routers and switches in your classroom
- Create files called "routers.cfg" and "switches.cfg" in
/etc/nagios3/conf.d
- In the routers file you need to add 4 entries. Here is the initial
entry for the gateway router for the classroom:
define host {
use generic-host
host_name bb-gw
alias gw router
address 169.223.142.1
}
add in entries for the other three routers.
- There are four switches. Do the same in the switches.cfg file.
- Remember to look at the network diagram if you do not know their
names or IP addresses.
- Use the Nagios "pre-flight" check to verify that your configuration
is correct:
# nagios3 -v /etc/nagios3/nagios.cfg
- You may see some errors as there are no services defined for these
new entries. This is OK and we will be taking care of this later.
9. Reload/Restart Nagios
# /etc/init.d/nagios3 restart
- Take a look at http://localhost/nagios3 to see your changes.
- Click on the "Status Map" link to see how things look.
PART III
Defining Parents
-----------------------------------------------------------------------------
1. Define parents for your hardware devices
- Remember that Nagios is smart about what to check based on the state of
your network. This "smartness" is largely driven by the concept of
parent relationships. Each device in our network (except for the classroom
gateway router) has a parent device. You need to define what that device is
for each pc, router and switch in the files pcs.cfg, switches.cfg and
routers.cfg.
- This is extremely simple. To get you started here is an updated entry
for pcX who has a parent of switchY in the file pcs.cfg:
define host {
use generic-host
host_name pcX
alias PC X at Network Design Workshop
address _______________ [pcX's IP address here]
parents switchY
}
- Note, use the hostname, not the IP address for parents entries.
- Repeat this process for all the devices you have defined. If you do not know
the name of the parent device, or are confused about the network layout for
the classroom remember to use the network diagram:
http://noc/diagram
- Once you are done be sure to do:
# nagios3 -v /etc/nagios3/nagios.cfg
to check on the status of your work.
2. Restart Nagios and review the Status Map
# /etc/init.d/nagios3 restart
- Now click on the Status Map link again. It should look quite different!
PART IV
Defining Services
-----------------------------------------------------------------------------
1. Determine what services to define for what devices
- This is core to how you use Nagios and network monitoring tools in
general. So far we are simply using ping to verify that physical hosts
are up on our network. The next step is to decide what services you wish
to monitor for each host.
- In this particular class we have:
routers: 4 running ssh
switches: 3 run ssh and telnet, 1 runs just telnet
pcs: All pcs are running ssh and http
All student pcs (15 of them) are running snmp
So, let's configure Nagios to check for all of these services for these
devices.
2. Check that telnet is running on the workshop switches.
- You will need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg
to first define the "check_telnet" and to what group of hosts this
command will apply.
- Edit the file services_nagios2.cfg:
# vi /etc/nagios3/conf.d/services_nagios2.cfg
At the bottom of the file add in the new service definition. It will look
like this:
# check that telnet is running
define service {
hostgroup_name telnet-servers
service_description Telnet
check_command check_telnet
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
- By default Nagios (on Ubuntu) is pre-configured with web, ssh and ping
service definition. It turns out, once we are completely done, that you
may not need the ping service definition - but, don't remove it yet!
- Notice the parameter that says:
hostgroup_name telnet-servers
We need to create this before we try to restart Nagios. Edit the file
/etc/nagios3/conf.d/hostgroups_nagios2.cfg and at the bottom of the
file add the following entry:
# A list of your telnet-accessible devices (older switches)
define hostgroup {
hostgroup_name telnet-servers
alias Telnet servers
members bb-sw,pc1-5-sw,pc6-10-sw,pc11-15-sw
}
Note the "members" section. Hopefully when you defined your switches
in the switches.cfg file this is what you used for the host_name directive
for the switches.
- Save your charges and check your configuration:
# nagios3 -v /etc/nagios3/nagios.cfg
- Restart Nagios and see if you notice the changes you've made. Note that
the actual check of the telnet service will most likely be in a "pending"
state at first.
3.) Verify that SSH is running on the routers and workshop PCs
- In the file services_nagios2.cfg there is already an entry for the SSH
service check, so you do not need to create this step. Instead, you
simply need to re-define the "ssh-servers" entry in the file
/etc/nagios3/conf./hostgroups_nagios2.cfg. The initial entry in the file
look like:
# A list of your ssh-accessible servers
define hostgroup {
hostgroup_name ssh-servers
alias SSH servers
members localhost
}
What do you think you should change? Correct, the "members" line. You should
remove "localhost" and add in entries for all the classroom pcs, routers and
the three switches that run ssh. The one switch that does not run ssh
is "bb-sw"... With this information and the network diagram you should be able
complete this entry:
- Once you are done, run the pre-flight check:
# nagios3 -v /etc/nagios3/nagios.cfg
If everything looks good, then restart Nagios and see your changes in the
Nagios web interface.
4.) Check that http is running on all the workshop PCs.
- Like ssh, there is already a check_http service defined and it automatically
applies to the http-servers group. (Note, you can add additional groups of hosts
for any service check if you wish). So, you need to update the "http-servers" entry
in the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg to include all the workshop
PCs running http (i.e. Apache Web Server).
- See the previous exercise and make the appropriate change to do this. If you have
any questions ask your instructor for help.
5.) Check that SNMP is running on the classroom PCs.
- First you will need to add in the appropriate service check for SNMP in the file
/etc/nagios3/conf.d/services_nagios2.cfg. This is where Nagios is impressive. There
are hundreds, if not thousands, of service checks available via the various Nagios
sites on the web. You can see what plugins are installed by Ubuntu in the nagios3
package that we've installed by looking in the following directory:
# ls /usr/lib/nagios/plugins
As you'll see there is already a check_snmp plugin available to us. If you are
interested in the options the plugin takes you can execute the plugin from the
command line by typing:
# /usr/lib/nagios/plugins/check_snmp
to see what options are available, etc. You can use the check_snmp plugin and
Nagios to create very complex or specific system checks.
- Now to see all the various service/host checks that have been created using the
check_snmp plugin you can look in /etc/nagios-plugins/config/snmp.cfg. You will
see that there are a lot of preconfigured checks using snmp, including:
snmp_load
snmp_cpustats
snmp_procname
snmp_disk
snmp_mem
snmp_swap
snmp_procs
snmp_users
snmp_mem2
snmp_swap2
snmp_mem3
snmp_swap3
snmp_disk2
snmp_tcpopen
snmp_tcpstats
snmp_bgpstate
check_netapp_uptime
check_netapp_cupuload
check_netapp_numdisks
check_compaq_thermalCondition
And, even better, you can create additional service checks quite easily.
For the case of verifying that snmpd (the SNMP service on Linux) is running we
need to ask SNMP a question. If we don't get an answer, then Nagios can assume
that the SNMP service is down on that host. When you use service checks such as
check_http, check_ssh and check_telnet this is what they are doing as well.
- In our case, let's create a new service check and call it "check_system". This
service check will connect with the specified host, use the private community
string we have defined in class and ask a question of snmp on that ask - in this
case we'll ask about the System Description, or the OID "sysDescr.0" -
- To do this start by editing the file /etc/nagios-plugins/config/snmp.cfg:
# vi /etc/nagios-plugins/config/snmp.cfg
At the top (or the bottom, your choice) add the following entry to the file:
# īcheck_system_ command definition
define command{
command_name check_system
command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -C
'$ARG1$' -o sysDescr.0
}
Note that "command_line" is a single line.
- Now you need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg and add
in this service check. We'll run this check against all our servers in the
classroom, or the hostgroup "debian-servers"
- Edit the file /etc/nagios3/conf.d/services_nagios2.cfg
# vi /etc/nagios3/conf.d/services_nagios2.cfg
At the bottom of the file add the following definition:
# check that snmp is up on all servers
define service {
hostgroup_name debian-servers
service_description SNMP
check_command check_system!s3cr3t
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
Note that we have included our private community string here vs. hard-coding
it in the snmp.cfg file earlier.
- Now verify that your changes are correct and restart Nagios.
- If you click on the Service Detail menu choice in web interface you should see
the the SNMP check appear.
PART V
Create More Host Groups
-----------------------------------------------------------------------------
1. Update /etc/nagios3/conf.d/hostgroups_nagios2.cfg
- For the following exercises it will be very useful if we have created
or update the following hostgroups:
debian-servers
routers
switches
If you edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg you
will see an entry for debian-servers that just contains localhost.
Update this entry to include all the classroom PCs, including the
noc (this assumes that you created a "noc" entry in your pcs.cfg
file).
# vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg
Update the entry that says:
# A list of your Debian GNU/Linux servers
define hostgroup {
hostgroup_name debian-servers
alias Debian GNU/Linux Servers
members localhost
}
So that the "members" parameter contains:
members noc,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,
pc11,pc12,pc13,pc14,pc15
- Once you have done this, add in two more entries. One for routers and
one for switches. Call these entries "routers" and "switches".
- When you are done be sure to verify your work and restart Nagios.
PART V
Extended Host Information ("making your graphs pretty")
-----------------------------------------------------------------------------
1. Update extinfo_nagios2.cfg
- If you would like to use appropriate icons for your defined hosts in
Nagios this is where you do this. We have the three types of devices:
Cisco routers
Cisco switches
Ubuntu servers
There is a fairly large repository of icon images available for you to
use located here:
/usr/share/nagios/htdocs/images/logos/
these were installed by default as dependent packages of the nagios3
package in Ubuntu. In some cases you can find model-specific icons for
your hardware, but to make things simpler we will use the following
icons for our hardware:
/usr/share/nagios/htodcs/images/logos/base/debian.*
/usr/share/nagios/htdocs/images/logos/cook/router.*
/usr/share/nagios/htdocs/images/logos/cook/switch.*
- The next step is to edit the file /etc/nagios3/conf.d/extinfo_nagios2.cfg
and tell nagios what image you would like to use to represent your devices.
# vi /etc/nagios3/conf.d/extinfo_nagios2.cfg
Here is what an entry for your routers looks like (there is already
an entry for debian-servers that will work as is).
define hostextinfo {
hostgroup_name routers
icon_image cook/router.png
icon_image_alt Cisco Routers (2811)
vrml_image router.png
statusmap_image cook/router.gd2
}
Now add an entry for your switches. Once you are done check your
work and restart Nagios. Take a look at the Status Map in the web interface.
It should be much nicer.
PART VI
Create Service Groups
-----------------------------------------------------------------------------
1. Create service groups for ssh and http for each set of pcs.
- The idea here is to create three service groups. Each service group will
be for the group of PCs that are connected to the routers pc1-5-gw,
pc6-10-gw and pc11-15-gw. We want to see these PCs grouped together
and include status of their ssh and http services. To do this edit
and create the file:
# vi /etc/nagios3/conf.d/servicegroups.cfg
Here is a sample of the service group for the router pc1-5-gw:
define servicegroup{
servicegroup_name group 1 services
alias pcs 1-5
members pc1,SSH,pc1,HTTP,pc2,SSH,pc2,HTTP,pc3,SSH,
pc3,HTTP,pc4,SSH,pc4,HTTP,pc5,SSH,pc5,HTTP
}
Add in groups for pcs 6-10 and for pcs11-15. You can call these service
groups anything you want.
- Save your changes, verify your work and restart Nagios. Now if you click on
the Servicegroup menu items in the Nagios web interface you should see
this information grouped together.
PART VII
Configure Guest Access to the Nagios Web Interface
-----------------------------------------------------------------------------
1. Edit /etc/nagios3/cgi.cfg to give r/o guest access.
- By default Nagios is configured to give full r/w access via the Nagios
web interface to the user nagiosadmin. You can change the name of this
user, add other users, change how you authenticate users, what users
have access to what resources and more via the cgi.cfg file.
- First, lets create a "guest" user and password in the htpasswd.users
file.
# htpasswd /etc/nagios3/htpasswd.users guest
You can use any password you want (or none). A password of "guest" is
not a bad choice.
- Next, edit the file /etc/nagios3/cgi.cfg and look for what type
of access has been given to the nagiosadmin user. By default
you will see the following directives (note, there are comments between
each directive):
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
Now lets tell Nagios to allow the "guest" user some access to
information via the web interface. You can choose whatever you would
like, but what is pretty typical is this:
authorized_for_system_information=nagiosadmin,guest
authorized_for_configuration_information=nagiosadmin,guest
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin,guest
authorized_for_all_hosts=nagiosadmin,guest
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
- Once you make the changes, save the file cgi.cfg, verify your
work and restart Nagios.
- To see if you can log in as the "guest" user you may need to clear
the cookies in your web browser. You will not notice any difference
in the web interface. The difference is that a number of items that
are available via the web interface (forcing a service/host check,
scheduling checks, comments, etc.) will not work for the guest
user.
UPCOMING
New Commands, Updating Contact Information, Connecting Nagios to RT (tickets)
-----------------------------------------------------------------------------
During the ticket management sessions later int he week we will be working on
these items to allow Nagios to automatically create tickets in RT when certain
events take place.
Last update 24 Feb 2010 by HA