Nagios Installation and Configuration Notes: ------ * Commands preceded with "$" imply that you should execute the command as a general user - not as root. * Commands preceded with "#" imply that you should be working as root. * Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") imply that you are executing commands on remote equipment, or within another program. Exercises --------- Exercises Part I ---------------- 0. Log in to your PC or open a terminal window as the sysadm user. 1. You may need to install Nagios version 3. You would do this as root or as the sysadmin user and use the "sudo" command. As sysadm: $ sudo apt-get install nagios3 Unless you already have an MTA installed, nagios3 will install postfix as a dependency. Select "Internet Site" option. (If you had wanted to use a different MTA likely you'd install it before nagios3) You will be prompted for nagiosadmin password. Give it the normal workshop password. To get the documentation in /usr/share/doc/nagios3-doc/html/ (which can also be read via the nagios web interface), do: $ sudo apt-get install nagios3-doc 2. Look at the file which contains the password. It's hashed (encrypted) $ cat /etc/nagios3/htpasswd.users 3. You should already have a working Nagios! - Open a browser, and go to your machine like this: http://pcN.ws.nsrc.org/nagios3/ - At the login prompt, login as: user: nagiosadmin pass: Browse to the "Host Detail" page to see what's already configured. 4. Let's look at the configuration layout... But, first, let's become the root user on your machine: $ sudo bash # cd /etc/nagios3 # ls -l -rw-r--r-- 1 root root 1882 2008-12-18 13:42 apache2.conf -rw-r--r-- 1 root root 10524 2008-12-18 13:44 cgi.cfg -rw-r--r-- 1 root root 2429 2008-12-18 13:44 commands.cfg drwxr-xr-x 2 root root 4096 2009-02-14 12:33 conf.d -rw-r--r-- 1 root root 26 2009-02-14 12:36 htpasswd.users -rw-r--r-- 1 root root 42539 2008-12-18 13:44 nagios.cfg -rw-r----- 1 root nagios 1293 2008-12-18 13:42 resource.cfg drwxr-xr-x 2 root root 4096 2009-02-14 12:32 stylesheets # cd conf.d # ls -l -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg -rw-r--r-- 1 root root 418 2008-12-18 13:42 extinfo_nagios2.cfg -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg -rw-r--r-- 1 root root 210 2009-02-14 12:33 host-gateway_nagios3.cfg -rw-r--r-- 1 root root 976 2008-12-18 13:42 hostgroups_nagios2.cfg -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg Notice that the package installs files with "nagios2" in their name. This is because they are the same files as were used for the Nagios version 2 Debian package. However there was a change made to the host-gateway configuration file, so this has a new name. 5. You have a config which is already monitoring your own system (localhost_nagios2.cfg) and your upstream default gateway (host-gateway_nagios3.cfg). Have a look at the config file for the default gateway: it's very simple. (Note: tab completion is useful here. Type cat host-g then hit tab; the filename will be filled in for you) # cat host-gateway_nagios3.cfg # a host definition for the gateway of the default route define host { host_name gateway alias Default Gateway address 10.10.0.254 use generic-host } PART II Configuring Equipment ----------------------------------------------------------------------------- 0. Order of configuration Conceptually we will build our configuration files from the "nearest" device then the further away ones. By going in this order you will have defined the devices that act as parents for other devices. Remember to refer to the Network Diagram for our classroom if you get confused. We have the following instances: rtr (the gateway router: 10.10.0.254) sw (the gateway switch: 10.10.0.253, parent: rtr) rtr1 (group 1 router: 10.10.0.201, parent: sw) rtr2 (group 2 router: 10.10.0.202, parent: sw) rtr3 (group 3 router: 10.10.0.203, parent: sw) rtr4 (group 4 router: 10.10.0.204, parent: sw) rtr5 (group 5 router: 10.10.0.205, parent: sw) rtr6 (group 6 router: 10.10.0.206, parent: sw) rtr7 (group 7 router: 10.10.0.207, parent: sw) rtr8 (group 8 router: 10.10.0.208, parent: sw) rtr9 (group 9 router: 10.10.0.209, parent: sw) rtr10 (group 10 router: 10.10.0.210, parent: sw) pc1 (10.10.0.1, parent: sw) pc2 (10.10.0.2, parent: sw) ... pc29 (10.10.0.29, parent: sw) pc30 (10.10.0.30, parent: sw) .. pc40 (10.10.0.40, parent: sw) s1 (10.10.0.241, parent: sw) s2 (10.10.0.242, parent: sw) s3 (10.10.0.243, parent: sw) s4 (10.10.0.244, parent: sw) noc (10.10.0.250, parent: sw) ap1 (10.10.0.251, parent: sw) ap2 (10.10.0.252, parent: sw) We recommend grouping these items in the files: routers.cfg (rtr, rtr1...rtr5) switches.cfg (sw) pcs.cfg (pc1...pc30, s1, s2, noc, ap1, ap2) 1. First we need to tell Nagios to monitor the gateway router for our classroom which is 10.10.0.254: # cd /etc/nagios3/conf.d/ Create the routers gateway like this: # editor routers.cfg define host { use generic-host host_name rtr alias Gateway Router address 10.10.0.254 } In the same file create the 5 entries for the group routers: define host { use generic-host host_name rtrX alias Group 1 Router address 10.10.0.20X parents sw } ... and replace 'X' in the definition above with the router number (1 - 5) repeat this for rtr2, rtr3, rtr4 ... up to rtr10 Note that the entry for "sw" our gateway switch has not yet been created. That is next. Exit and save this file. 2. Create a file called switches.cfg and add an entry for this item: # editor switches.cfg define host { use generic-host host_name sw alias Backbone Switch address 10.10.0.253 parents rtr } At this point Nagios is configured to monitor whether our core hosts (the parents) are up on our classroom network. Your next steps are to add in the individual hosts such as the classroom virtual PC images on your table (for example for group 1, pc1 - 6, for group 2, pc7 - 12, etc.), the Wireless Access Points (ap1 and ap2), the servers s1, s2 and the noc: Be sure you add in a proper "parents" entry for each host. Remember, if you don't understand the parent relations in our network you can review the logical network diagram on the wiki! Note the Nagios parent bullet points in the slides! "Nagios Parent Relationships" STEPS 2a - 2c SHOULD BE REPEATED WHENEVER YOU UPDATE THE CONFIGURATION! 2a. Verify that your configuration files are OK: # nagios3 -v /etc/nagios3/nagios.cfg ... You should get some warnings like : Warning: Host 'rtr' has no services associated with it! Warning: Host 'sw' has no services associated with it! etc.... ... Total Warnings: N Total Errors: 0 Things look okay - No serious problems were detected during the check. Nagios is saying that it's unusual to monitor a device just for its existence on the network, without also monitoring some service. 2b. Reload/Restart Nagios # /etc/init.d/nagios3 stop # /etc/init.d/nagios3 start or # service nagios3 restart 2c. Go to the web interface (http://pcN.ws.nsrc.org/nagios3) and check that the hosts you just added are now visible in the interface. Click on the "Host Detail" item on the left of the Nagios screen to see this. You may see it in "PENDING" status until the check is carried out. HINT: You will be doing this a lot. If you do it all on one line, like this, then you can hit cursor-up and rerun all in one go: nagios3 -v /etc/nagios3/nagios.cfg && service nagios3 restart The '&&' ensures that the restart only happens if the config is valid. 3. Create entries for the classroom PCs Now that we have our routers and switches defined it is quite easy to create entries for all our PCs. Think about the parent relationships: Remember, if you do not understand the parent relationship refer back to the classroom network diagram ! Below are three sample entries. One for the NOC, one for pc1 and one for pc6. You should be able to use this example to create entries for all classroom PCs plus the NOC. We could put these entries in to separate files, but as our network is small we'll use a single file called pcs.cfg. NOTE! You do not add in an entry for your own PC or router. This has already been defined in the file /etc/nagios3/conf.d/localhost_nagios2.cfg. This definition is what defines the Nagios network viewpoint. So, when you come to the spot where you might add an entry for your PC you should skip this and go on to the next PC in the list. # editor pcs.cfg # Our classroom NOC define host { use generic-host host_name noc alias Workshop NOC machine address 10.10.0.250 parents sw } # PCs define host { use generic-host host_name pc1 alias pc1 address 10.10.0.1 parents sw } define host { use generic-host host_name pc6 alias pc6 address 10.10.0.6 parents sw } Pay attention to the parent entries and the IP addresses. Take the three entries above and now expand this to create the remaining entries for the PCs in your group. That is, if you are in group 1, fill in for PCs 2 through 5 (rememember to SKIP your own PC!). Exit and save the file pcs.cfg As before, repeat steps 2a-2c to verify your configuration, correct any errors, and activate it. 5. Look at your Nagios instance on the web. Note that "Status Map" gives you a graphical view of the parent-child relationships you have defined. PART III Configure Service check for the classroom NOC ----------------------------------------------------------------------------- 0. Configuring Now that we have our hardware configured we can start telling Nagios what services to monitor on the configured hardware, how to group the hardware in interesting ways, how to group services, etc. 1. Associate a service check for our classroom NOC # editor hostgroups_nagios2.cfg - Find the hostgroup named "ssh-servers". In the members section of the defintion change the line: members localhost to members localhost,noc Exit and save the file. Verify that your changes are OK: # nagios3 -v /etc/nagios3/nagios.cfg Restart Nagios to see the new service assocation with your host: # /etc/init.d/nagios3 restart Click on the "Service Detail" link in the Nagios web interface to see your new entry. PART IV Defining Services for all PCs ----------------------------------------------------------------------------- 0. For services, the default normal_check_interval is 5 (minutes) in generic-service_nagios2.cfg. You may wish to change this to 1 to speed up how quickly service issues are detected, at least in the workshop. 1. Determine what services to define for what devices - This is core to how you use Nagios and network monitoring tools in general. So far we are simply using ping to verify that physical hosts are up on our network and we have started monitoring a single service on a single host (your PC). The next step is to decide what services you wish to monitor for each host in the classroom. - In this particular class we have: routers: running ssh and snmp switches: running telnet and possibly ssh as well as snmp pcs: All PCs are running ssh and http and should be running snmp The NOC is currently running an snmp daemon So, let's configure Nagios to check for these services for these devices. 2.) Verify that SSH is running on the routers and workshop PCs images - In the file services_nagios2.cfg there is already an entry for the SSH service check, so you do not need to create this step. Instead, you simply need to re-define the "ssh-servers" entry in the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg. The initial entry in the file looked like: # A list of your ssh-accessible servers define hostgroup { hostgroup_name ssh-servers alias SSH servers members localhost,noc } What do you think you should change? Correct, the "members" line. You should add in entries for all the classroom pcs, routers and the switches that run ssh. With this information and the network diagram you should be able complete this entry. The entry will look something like this: define hostgroup { hostgroup_name ssh-servers alias SSH servers members localhost,pc1,pc2,pc3,...,pc6,ap1,ap2,s1,s2,noc } Note: leave in "localhost" - This is your PC and represents Nagios' network point of view. So, for instance, if you are on "pc3" you would not include "pc3" in the list of all the classroom pcs as it is represented by the "localhost" entry. The "members" entry will be a long line and will likely wrap on the screen. Remember to include all the PCs on your table and the routers that you have defined. Do not include any entries if they are not already defined in pcs.cfg, switches.cfg or routers.cfg. - Once you are done, run the pre-flight check: # nagios3 -v /etc/nagios3/nagios.cfg If everything looks good, then restart Nagios # /etc/init.d/nagios3 stop # /etc/init.d/nagios3 start and view your changes in the Nagios web interface. To continue with hostgroups you can add additional groups for later use, such as all our virtual servers. Go ahead and edit the file hostgroups_nagios2.cfg again: # editor hostgroups_nagios2.cfg and add the following to the end of the file: # A list of our virtual routers define hostgroup { hostgroup_name cisco7200 alias Cisco 7200 Routers members rtr1,rtr2,rtr3,rtr4,rtr5,rtr6,rtr7,rtr8,rtr9,rtr10 } Save and exit from the file. Verify that everything is OK: # nagios3 -v /etc/nagios3/nagios.cfg If everything looks good, then restart Nagios # /etc/init.d/nagios3 stop # /etc/init.d/nagios3 start 3.) Check that http is running on all the classroom PCs. - This is almost identical to the previous exercise. Just make the change to the HTTP service adding in each PC (no routers or switches). Remember, you don't need to add your machine as it is already defined as "localhost". Find the definition in hostgroups_nagios2.cfg: define hostgroup { hostgroup_name http-servers alias HTTP servers members localhost } and after localhost, add all the PCs in your group.