ANOG 16 - Nagios Installation and Configuration Notes: ------ * Commands preceded with "$" imply that you should execute the command as a general user - not as root. * Commands preceded with "#" imply that you should be working as root. * Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") imply that you are executing commands on remote equipment, or within another program. Exercises --------- PART I ------ 1. Start by enabling Nagios * in /etc/rc.conf add the following line: nagios_enable="YES" Configuration templates are available in /usr/local/etc/nagios as *.cfg-sample files. Copy them to *.cfg files where required and edit to suit your needs. Documentation is available in HTML form in /usr/local/www/nagios/docs. Before we can use Nagios, we need to configure it -- we will do this in Part II 2. Configure Aapche for Nagios * create /usr/local/etc/apache22/Includes/nagios.conf In the file, add: Order deny,allow Allow from all Options ExecCGI AllowOverride AuthConfig ScriptAlias /nagios/cgi-bin/ /usr/local/www/nagios/cgi-bin/ Alias /nagios/ /usr/local/www/nagios/ Save this file and exit. 3. Create the Web user password file: # htpasswd -c /usr/local/etc/nagios/htpasswd.users nagiosadmin New password: Re-type new password: We suggest you use your standard user password used in class. Now, create a .htaccess file to ask for a password when opening the Nagios page: # vi /usr/local/www/nagios/.htaccess In the file, add: AuthName "Nagios Access" AuthType Basic AuthUserFile /usr/local/etc/nagios/htpasswd.users require valid-user Save the file, and exit. 4. The web interface of Nagios should be ready at this point, but most views won't work since Nagios has not been configured yet! - Open a browser, and go to http://wsXX.ws3.conference.sanog.org/nagios/ - At the login prompt, login as: user: nagiosadmin pass: Now we need to configure Nagios PART II Configuring Equipment ----------------------------------------------------------------------------- 1. Let's configure Nagios # cd /usr/local/etc/nagios/ # cp cgi.cfg-sample cgi.cfg <- Web module config # cp resource.cfg-sample resource.cfg <- Nagios internal config # cp nagios.cfg-sample nagios.cfg <- Nagios main config # cd /usr/local/etc/nagios/objects/ # cp commands.cfg-sample commands.cfg <- plugin configuration # cp contacts.cfg-sample contacts.cfg <- contact people # cp templates.cfg-sample templates.cfg <- predefined objects # cp timeperiods.cfg-sample timeperiods.cfg <- timeperiods # cp localhost.cfg-sample localhost.cfg <- a sample config This is the most basic and minimal configuration -- we have taken all the default configuration files and enabled them. The last file, "localhost.cfg", defines a monitoring configuration for your own PC (wsXX). 2. Let's verify that nagios is happy with the configuration: # cd /usr/local/etc/nagios # nagios -v nagios.cfg You should see: ... output ... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check We are ready to start nagios! # /usr/local/etc/rc.d/nagios start 3. You can now go back to the web interface http://wsXX.ws3.conference.sanog.org/nagios/ ... over the next few minutes, Nagios will update the status for the services on the localhost (wsXX). You can check out "Hostgroup grid" and "Hostgroup overview" options in the left menu, then click on "localhost" to get the details. Look at the file: /usr/local/etc/nagios/objects/localhost.cfg ... and try to understand what is being monitored, by comparing what you see in the "localhost" view in the Nagios web interface, and the .cfg file. We need to make a small change to the file: /usr/local/etc/nagios/nagios.cfg ... so it will automatically read all .cfg files from the /usr/local/etc/nagios/objects directory, and we don't have to always edit nagios.cfg to add them. So edit /usr/local/etc/nagios/nagios.cfg, and make the following changes: COMMENT the 4 lines like this: cfg_file=/usr/local/etc/nagios/objects/commands.cfg cfg_file=/usr/local/etc/nagios/objects/contacts.cfg cfg_file=/usr/local/etc/nagios/objects/timeperiods.cfg cfg_file=/usr/local/etc/nagios/objects/templates.cfg Comment = add '#' at the beginning, so they look like this: # cfg_file=/usr/local/etc/nagios/objects/commands.cfg # cfg_file=/usr/local/etc/nagios/objects/contacts.cfg # cfg_file=/usr/local/etc/nagios/objects/timeperiods.cfg # cfg_file=/usr/local/etc/nagios/objects/templates.cfg Do the same for cfg_file=/usr/local/etc/nagios/objects/localhost.cfg ... so it becomes: # cfg_file=/usr/local/etc/nagios/objects/localhost.cfg ... and add another line: cfg_dir=/usr/local/etc/nagios/objects ... Now save the file and exit. One last change: # cd /usr/local/etc/nagios/objects/ # mv localhost.cfg main.cfg main.cfg is a nicer name than localhost, since we're going to be adding new hosts and parameters to it. Test that nagios is happy: # nagios -v nagios.cfg If Nagios complains, double check your changes, and if it is still a problem, ask one of the instructors for help. Finally, restart Nagios: # /usr/local/etc/rc.d/nagios restart 4. Let's start monitoring another computer in our classroom: - Pick any other WS in the class, which you will monitor. # cd /usr/local/etc/nagios/objects # vi ws-all.cfg define host { use freebsd-server host_name wsYY alias WS YY in WS3 address _______________ [wsYY's IP address here] } Note: YY is *another* machine in the class, not your own. ... Save and quit 5. Let's create a new hostgroup for the occasion, and add our host to it - Let's add the hostgroup to the "main.cfg" file: # cd /usr/local/etc/nagios/objects/ # vi main.cfg Find the section called "Define an optional hostgroup for FreeBSD machines", and just under it, add: define hostgroup { hostgroup_name classroom alias All WS in the class members wsYY } 6. Now let's associate some services to that host Still in the file "main.cfg", find the section called: "Define a service to check SSH on the local machine" and change the line: host_name localhost to hostgroup_name freebsd-servers, classroom Save the file and exit 7. Verify that your configuration file is OK: # nagios -v /usr/local/etc/nagios/nagios.cfg ... You should get : Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the check. 8. Reload/Restart Nagios # /usr/local/etc/rc.d/nagios restart 9. Go to the web interface (http://wsXX.ws3.conference.sanog.org/nagios) and check the host you just added. 10. Add ALL the PCs (WS1 - WS15) in your classroom. - Remember to verify the configuration file! - I suggest that you create a single config file called "ws-all.cfg" to do this, and put all the hosts in it. - You will repeat step 4 for each machine. - When finished, remember to add all the hosts into the "classroom" group in the file main.cfg. The format of the members statement is: members wsXX,wsYY,wsZZ,... 11. Reload/Restart Nagios # /usr/local/etc/rc.d/nagios restart - Take a look at http://wsXX.ws3.conference.sanog.org/nagios to see your changes. - Click on the "Status Map" link to see how things look. PART III Adding Services ----------------------------------------------------------------------------- 1. Determine what services to add for what devices - This is core to how you use Nagios and network monitoring tools in general. So far we are simply checking SSH to see if the machines are up on our network. The next step is to decide what services you wish to monitor for each host. - In this particular class we have: pcs: All wsXX are running ssh, http and imap/pop All student pcs are running an snmp daemon So, let's configure Nagios to check for all of these services for these devices. 2.) Verify that HTTP is running on the classrom PCs - In the file main.cfg there is already an entry for the HTTP service check, so you do not need to create this step. Instead, you simply need to change "host_name localhost" for that service, to use a "hostgroup_name classroom", just like we did in step 6 in part II. So make this change in the "main.cfg" file -- find the section "Define a service to check HTTP on the local machine", and update the line: host_name localhost to hostgroup_name classroom And save the file. This tells Nagios that the HTTP service is not only running on the single host "localhost", but on ALL hosts in the hostgroup "classroom". - Once you are done, run the pre-flight check: # nagios -v /usr/local/etc/nagios/nagios.cfg If everything looks good, then restart Nagios and see your changes in the Nagios web interface. 3.) Check that all hosts answer to ping. - Like for HTTP, there is already a check_ping service defined and it automatically applies to the freebsd-servers group. (Note, you can add additional groups of hosts for any service check if you wish). So, you need to update the "PING" service definition in the main.cfg file, and make it use the hostgroup_name classroom, as we did in the previous step. - See the previous exercise and make the appropriate change to do this. If you have any questions ask your instructor for help. 4.) Let's add IMAP and POP monitoring The *commands* for check_pop and check_imap are already configured, so all we need to do is create *services* for them. First, edit the main.cfg file, and at the end, add the service definition for the check_pop: define service { use local-service hostgroup_name classroom service_description POP check_command check_pop } Check the configuration (nagios -v ...) and restart Nagios, then go to the web interface. Now, you add IMAP check in the same fashion (create service definition, called "check_imap", etc...). Don't forget to update the service_description! 5.) One last change... In the file main.cfg, REMOVE all the lines like: notifications_enabled 0 (there should be 2 lines like this -- delete them)