Files: exercises-nagios.html

File exercises-nagios.html, 22.1 KB (added by hervey, 9 years ago)

Nagios exercises in HTML

Line 
1<html>
2<head>
3<title>Network Monitoring and Management: Nagios Version 3 Exercises</title>
4</head>
5<body>
6
7<font size="5">
8<b>Nagios Exercises</b>
9</font>
10<br />
11<font size="4">
12<b>Network Monitoring and Management Workshop</b>
13</font>
14
15<pre>
16<font size="3">
17<b>PART I</b>
18-----------------------------------------------------------------------------
19
201. Install Nagios version 3
21
22    # apt-get install nagios3
23
242. Create the Web user password file:
25
26    # htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin
27
28New password:         <type a password>
29Re-type new password: <type password again>
30
31   We suggest you use your standard user password used in class.
32
33
342. You should already have a working Nagios!
35
36    - Open a browser, and go to
37
38    http://localhost/nagios3/
39
40    - At the login prompt, login as:
41
42        user: nagiosadmin
43        pass: <the password you chose>
44
453. Let's look at the interface together...
46
47    # cd /etc/nagios3/
48
49    # ls -l
50    -rw-r--r-- 1 root root    1882 2008-12-18 13:42 apache2.conf
51    -rw-r--r-- 1 root root   10524 2008-12-18 13:44 cgi.cfg
52    -rw-r--r-- 1 root root    2429 2008-12-18 13:44 commands.cfg
53    drwxr-xr-x 2 root root    4096 2009-02-14 12:33 conf.d
54    -rw-r--r-- 1 root root      26 2009-02-14 12:36 htpasswd.users
55    -rw-r--r-- 1 root root   42539 2008-12-18 13:44 nagios.cfg
56    -rw-r----- 1 root nagios  1293 2008-12-18 13:42 resource.cfg
57    drwxr-xr-x 2 root root    4096 2009-02-14 12:32 stylesheets
58   
59    # ls -l conf.d/
60
61    -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
62    -rw-r--r-- 1 root root  418 2008-12-18 13:42 extinfo_nagios2.cfg
63    -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
64    -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
65    -rw-r--r-- 1 root root  210 2009-02-14 12:33 host-gateway_nagios3.cfg
66    -rw-r--r-- 1 root root  976 2008-12-18 13:42 hostgroups_nagios2.cfg
67    -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
68    -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
69    -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg
70
71    Notice that the package does not have renamed filenames for the conf.d
72    directory - they are the same files as used for the Nagios version 2
73    Ubuntu package. There was an update made to the host-gateway configuration
74    file so this has been renamed.
75
76<b>PART II</b>
77Configuring Equipment
78-----------------------------------------------------------------------------
79
801. According to what we saw in class, let's add a new host
81
82    - Pick any PC in the room.
83
84    # cd /etc/nagios3/conf.d/
85
86    # vi pcX.cfg        (Where X is some number)
87
88define host {
89    use         generic-host
90    host_name   pcX
91    alias       PC X at Network Design Workshop
92    address     _______________       [pcX's IP address here]
93}
94
95    ... Save and quit
96
972. Let's create a new hostgroup for the occasion, and add our host
98   to it
99
100    - Edit the file hostgroups_nagios2.cfg and add a new group:
101
102    # vi hostgroups_nagios2.cfg
103
104define hostgroup {
105    hostgroup_name  servers
106    alias           Network Design PCs
107    members         pcX
108}
109
1103. Now let's associate some services to that host
111
112    # vi services_nagios2.cfg
113
114    - Find the section called "check that ssh services are running",
115      and change the line:
116
117hostgroup_name                  ssh-servers
118
119    to
120
121hostgroup_name                  ssh-servers, servers
122
123
124
1254. Verify that your configuration file is OK:
126
127    # nagios3 -v /etc/nagios3/nagios.cfg
128
129    ... You should get :
130
131Total Warnings: 0
132Total Errors:   0
133
134Things look okay - No serious problems were detected during the check.
135
136
1375. Reload/Restart Nagios
138
139    # /etc/init.d/nagios3 restart
140
1416. Go to the web interface (http://localhost/nagios3) and check the host
142   you just added
143
144
1457. Add ALL the PCs in the classroom
146
147    - Remember to verify the configuration file!
148
149    - I suggest that you create a single config file called pcs.cfg
150      to do this.
151
152    - You will repeat steps 1, 2 and 3 from above. When you edit the
153      file hostgroups_nagios2.cfg to update the members of the servers
154      group the format of the members statement is:
155
156      members       pcX,pcY,pcZ,...
157
158    - If you do not know the names of all the PCs in the classroom or
159      their IP addresses refer to the classroom Network Diagram either
160      available in the classroom, or on the class web site:
161
162      http://nsrc.org/workshops/2010/apricot/
163
164      *Also available for now at http://noc/diagram
165
1668. Add the routers and switches in your classroom
167
168    - Create files called "routers.cfg" and "switches.cfg" in
169      /etc/nagios3/conf.d
170   
171    - In the routers file you need to add 4 entries. Here is the initial
172      entry for the gateway router for the classroom:
173
174define host {
175    use         generic-host
176    host_name   bb-gw
177    alias       gw router
178    address     169.223.142.1
179}
180
181      add in entries for the other three routers.
182
183    - There are four switches. Do the same in the switches.cfg file.
184
185    - Remember to look at the network diagram if you do not know their
186      names or IP addresses.
187
188    - Use the Nagios "pre-flight" check to verify that your configuration
189      is correct:
190
191    # nagios3 -v /etc/nagios3/nagios.cfg
192
193    - You may see some errors as there are no services defined for these
194      new entries. This is OK and we will be taking care of this later.
195   
1969. Reload/Restart Nagios
197
198    # /etc/init.d/nagios3 restart
199
200    - Take a look at http://localhost/nagios3 to see your changes.
201
202    - Click on the "Status Map" link to see how things look.
203
204<b>PART III</b>
205Defining Parents
206-----------------------------------------------------------------------------
207
2081. Define parents for your hardware devices
209
210   - Remember that Nagios is smart about what to check based on the state of
211     your network. This "smartness" is largely driven by the concept of
212     parent relationships. Each device in our network (except for the classroom
213     gateway router) has a parent device. You need to define what that device is
214     for each pc, router and switch in the files pcs.cfg, switches.cfg and
215     routers.cfg.
216
217   - This is <i>extremely</i> simple. To get you started here is an updated entry
218     for pcX who has a parent of switchY in the file pcs.cfg:
219
220define host {
221    use         generic-host
222    host_name   pcX
223    alias       PC X at Network Design Workshop
224    address     _______________       [pcX's IP address here]
225    parents    switchY
226}
227
228   - Note, use the hostname, not the IP address for parents entries.
229
230   - Repeat this process for all the devices you have defined. If you do not know
231     the name of the parent device, or are confused about the network layout for
232     the classroom remember to use the network diagram:
233
234     http://noc/diagram
235
236   - Once you are done be sure to do:
237
238   # nagios3 -v /etc/nagios3/nagios.cfg
239   
240     to check on the status of your work.
241
2422. Restart Nagios and review the Status Map
243
244   # /etc/init.d/nagios3 restart
245
246   - Now click on the Status Map link again. It should look quite different!
247
248<b>PART IV</b>
249Defining Services
250-----------------------------------------------------------------------------
251
2521. Determine what services to define for what devices
253
254   - This is core to how you use Nagios and network monitoring tools in
255     general. So far we are simply using ping to verify that physical hosts
256     are up on our network. The next step is to decide what services you wish
257     to monitor for each host.
258
259   - In this particular class we have:
260
261     routers:  4 running ssh
262     switches: 3 run ssh and telnet, 1 runs just telnet
263     pcs:      All pcs are running ssh and http
264               All student pcs (15 of them) are running snmp
265             
266     So, let's configure Nagios to check for all of these services for these
267     devices.
268
2692. Check that telnet is running on the workshop switches.
270
271   - You will need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg
272     to first define the "check_telnet" and to what group of hosts this
273     command will apply.
274
275   - Edit the file services_nagios2.cfg:
276
277   # vi /etc/nagios3/conf.d/services_nagios2.cfg
278
279   At the bottom of the file add in the new service definition. It will look
280   like this:
281
282# check that telnet is running
283define service {
284        hostgroup_name                  telnet-servers
285        service_description             Telnet
286        check_command                   check_telnet
287        use                             generic-service
288        notification_interval           0 ; set > 0 if you want to be renotified
289}
290
291   - By default Nagios (on Ubuntu) is pre-configured with web, ssh and ping
292     service definition. It turns out, once we are completely done, that you
293     may not need the ping service definition - but, don't remove it yet!
294
295   - Notice the parameter that says:
296
297     hostgroup_name                    telnet-servers
298
299     We need to create this before we try to restart Nagios. Edit the file
300     /etc/nagios3/conf.d/hostgroups_nagios2.cfg and at the bottom of the
301     file add the following entry:
302
303# A list of your telnet-accessible devices (older switches)
304define hostgroup {
305        hostgroup_name  telnet-servers
306                alias           Telnet servers
307                members         bb-sw,pc1-5-sw,pc6-10-sw,pc11-15-sw
308        }
309
310     Note the "members" section. Hopefully when you defined your switches
311     in the switches.cfg file this is what you used for the host_name directive
312     for the switches.
313
314   - Save your charges and check your configuration:
315
316   # nagios3 -v /etc/nagios3/nagios.cfg
317
318   - Restart Nagios and see if you notice the changes you've made. Note that
319     the actual check of the telnet service will most likely be in a "pending"
320     state at first.
321
3223.) Verify that SSH is running on the routers and workshop PCs
323
324   - In the file services_nagios2.cfg there is already an entry for the SSH
325     service check, so you do not need to create this step. Instead, you
326     simply need to re-define the "ssh-servers" entry in the file
327     /etc/nagios3/conf./hostgroups_nagios2.cfg. The initial entry in the file
328     look like:
329
330# A list of your ssh-accessible servers
331define hostgroup {
332        hostgroup_name  ssh-servers
333                alias           SSH servers
334                members         localhost
335        }
336
337     What do you think you should change? Correct, the "members" line. You should
338     remove "localhost" and add in entries for all the classroom pcs, routers and
339     the three switches that run ssh. The one switch that <i>does not</i> run ssh
340     is "bb-sw"... With this information and the network diagram you should be able
341     complete this entry:
342
343    - Once you are done, run the pre-flight check:
344
345    # nagios3 -v /etc/nagios3/nagios.cfg
346
347    If everything looks good, then restart Nagios and see your changes in the
348    Nagios web interface.
349
3504.) Check that http is running on all the workshop PCs.
351
352    - Like ssh, there is already a check_http service defined and it automatically
353      applies to the http-servers group. (Note, you can add additional groups of hosts
354      for any service check if you wish). So, you need to update the "http-servers" entry
355      in the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg to include all the workshop
356      PCs running http (i.e. Apache Web Server).
357
358    - See the previous exercise and make the appropriate change to do this. If you have
359      any questions ask your instructor for help.
360
3615.) Check that SNMP is running on the classroom PCs.
362
363    - First you will need to add in the appropriate service check for SNMP in the file
364      /etc/nagios3/conf.d/services_nagios2.cfg. This is where Nagios is impressive. There
365      are hundreds, if not thousands, of service checks available via the various Nagios
366      sites on the web. You can see what plugins are installed by Ubuntu in the nagios3
367      package that we've installed by looking in the following directory:
368
369    # ls /usr/lib/nagios/plugins
370
371      As you'll see there is already a check_snmp plugin available to us. If you are
372      interested in the options the plugin takes you can execute the plugin from the
373      command line by typing:
374
375    # /usr/lib/nagios/plugins/check_snmp
376
377      to see what options are available, etc. You can use the check_snmp plugin and
378      Nagios to create very complex or specific system checks.
379
380    - Now to see all the various service/host checks that have been created using the
381      check_snmp plugin you can look in /etc/nagios-plugins/config/snmp.cfg. You will
382      see that there are a <i>lot</i> of preconfigured checks using snmp, including:
383
384      snmp_load
385      snmp_cpustats
386      snmp_procname
387      snmp_disk
388      snmp_mem
389      snmp_swap
390      snmp_procs
391      snmp_users
392      snmp_mem2
393      snmp_swap2
394      snmp_mem3
395      snmp_swap3
396      snmp_disk2
397      snmp_tcpopen
398      snmp_tcpstats
399      snmp_bgpstate
400      check_netapp_uptime
401      check_netapp_cupuload
402      check_netapp_numdisks
403      check_compaq_thermalCondition
404     
405      And, even better, you can create additional service checks quite easily.
406      For the case of verifying that snmpd (the SNMP service on Linux) is running we
407      need to ask SNMP a question. If we don't get an answer, then Nagios can assume
408      that the SNMP service is down on that host. When you use service checks such as
409      check_http, check_ssh and check_telnet this is what they are doing as well.
410
411    - In our case, let's create a new service check and call it "check_system". This
412      service check will connect with the specified host, use the private community
413      string we have defined in class and ask a question of snmp on that ask - in this
414      case we'll ask about the System Description, or the OID "sysDescr.0" -
415
416    - To do this start by editing the file /etc/nagios-plugins/config/snmp.cfg:
417
418    # vi /etc/nagios-plugins/config/snmp.cfg
419
420      At the top (or the bottom, your choice) add the following entry to the file:
421
422# ´check_system_ command definition
423define command{
424       command_name    check_system
425       command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -C
426'$ARG1$' -o sysDescr.0
427        }
428
429      Note that "command_line" is a single line.
430
431    - Now you need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg and add
432      in this service check. We'll run this check against all our servers in the
433      classroom, or the hostgroup "debian-servers"
434
435    - Edit the file /etc/nagios3/conf.d/services_nagios2.cfg
436
437    # vi /etc/nagios3/conf.d/services_nagios2.cfg
438
439      At the bottom of the file add the following definition:
440
441# check that snmp is up on all servers
442define service {
443        hostgroup_name                  debian-servers
444        service_description             SNMP
445        check_command                   check_system!s3cr3t
446        use                             generic-service
447        notification_interval           0 ; set > 0 if you want to be renotified
448}
449
450      Note that we have included our private community string here vs. hard-coding
451      it in the snmp.cfg file earlier.
452
453    - Now verify that your changes are correct and restart Nagios.
454
455    - If you click on the Service Detail menu choice in web interface you should see
456      the the SNMP check appear.
457
458<b>PART V</b>
459Create More Host Groups
460-----------------------------------------------------------------------------
461
4621. Update /etc/nagios3/conf.d/hostgroups_nagios2.cfg
463
464    - For the following exercises it will be very useful if we have created
465      or update the following hostgroups:
466
467      debian-servers
468      routers
469      switches
470 
471      If you edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg you
472      will see an entry for debian-servers that just contains localhost.
473      Update this entry to include all the classroom PCs, including the
474      noc (this assumes that you created a "noc" entry in your pcs.cfg
475      file).
476
477    # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg
478
479     Update the entry that says:
480
481
482# A list of your Debian GNU/Linux servers
483define hostgroup {
484        hostgroup_name  debian-servers
485                alias           Debian GNU/Linux Servers
486                members         localhost
487        }
488     
489      So that the "members" parameter contains:
490
491                members         noc,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,
492                                pc11,pc12,pc13,pc14,pc15
493
494      - Once you have done this, add in two more entries. One for routers and
495        one for switches. Call these entries "routers" and "switches".
496
497      - When you are done be sure to verify your work and restart Nagios.
498   
499
500<b>PART V</b>
501Extended Host Information ("making your graphs pretty")
502-----------------------------------------------------------------------------
503
5041. Update extinfo_nagios2.cfg
505
506    - If you would like to use appropriate icons for your defined hosts in
507      Nagios this is where you do this. We have the three types of devices:
508
509      Cisco routers
510      Cisco switches
511      Ubuntu servers
512
513      There is a fairly large repository of icon images available for you to
514      use located here:
515
516      /usr/share/nagios/htdocs/images/logos/
517
518      these were installed by default as dependent packages of the nagios3
519      package in Ubuntu. In some cases you can find model-specific icons for
520      your hardware, but to make things simpler we will use the following
521      icons for our hardware:
522
523      /usr/share/nagios/htodcs/images/logos/base/debian.*
524      /usr/share/nagios/htdocs/images/logos/cook/router.*
525      /usr/share/nagios/htdocs/images/logos/cook/switch.*
526
527    - The next step is to edit the file /etc/nagios3/conf.d/extinfo_nagios2.cfg
528      and tell nagios what image you would like to use to represent your devices.
529
530    # vi /etc/nagios3/conf.d/extinfo_nagios2.cfg
531
532      Here is what an entry for your routers looks like (there is already
533      an entry for debian-servers that will work as is).
534
535define hostextinfo {
536        hostgroup_name   routers
537        icon_image       cook/router.png
538        icon_image_alt   Cisco Routers (2811)
539        vrml_image       router.png
540        statusmap_image  cook/router.gd2
541}
542
543      Now add an entry for your switches. Once you are done check your
544      work and restart Nagios. Take a look at the Status Map in the web interface.
545      It should be much nicer.     
546
547<b>PART VI</b>
548Create Service Groups
549-----------------------------------------------------------------------------
550
5511. Create service groups for ssh and http for each set of pcs.
552
553   - The idea here is to create three service groups. Each service group will
554     be for the group of PCs that are connected to the routers pc1-5-gw,
555     pc6-10-gw and pc11-15-gw. We want to see these PCs grouped together
556     and include status of their ssh and http services. To do this edit
557     and create the file:
558
559   # vi /etc/nagios3/conf.d/servicegroups.cfg
560
561     Here is a sample of the service group for the router pc1-5-gw:
562
563define servicegroup{
564        servicegroup_name       group 1 services
565        alias                   pcs 1-5
566        members                 pc1,SSH,pc1,HTTP,pc2,SSH,pc2,HTTP,pc3,SSH,
567                                pc3,HTTP,pc4,SSH,pc4,HTTP,pc5,SSH,pc5,HTTP
568        }
569
570      Add in groups for pcs 6-10 and for pcs11-15. You can call these service
571      groups anything you want.
572
573    - Save your changes, verify your work and restart Nagios. Now if you click on
574      the Servicegroup menu items in the Nagios web interface you should see
575      this information grouped together.
576
577
578<b>PART VII</b>
579Configure Guest Access to the Nagios Web Interface
580-----------------------------------------------------------------------------
581
5821. Edit /etc/nagios3/cgi.cfg to give r/o guest access.
583
584    - By default Nagios is configured to give full r/w access via the Nagios
585      web interface to the user nagiosadmin. You can change the name of this
586      user, add other users, change how you authenticate users, what users
587      have access to what resources and more via the cgi.cfg file.
588
589    - First, lets create a "guest" user and password in the htpasswd.users
590      file.
591
592    # htpasswd /etc/nagios3/htpasswd.users guest
593
594      You can use any password you want (or none). A password of "guest" is
595      not a bad choice.
596
597    - Next, edit the file /etc/nagios3/cgi.cfg and look for what type
598      of access has been given to the nagiosadmin user. By default
599      you will see the following directives (note, there are comments between
600      each directive):
601
602      authorized_for_system_information=nagiosadmin
603      authorized_for_configuration_information=nagiosadmin
604      authorized_for_system_commands=nagiosadmin
605      authorized_for_all_services=nagiosadmin
606      authorized_for_all_hosts=nagiosadmin
607      authorized_for_all_service_commands=nagiosadmin
608      authorized_for_all_host_commands=nagiosadmin
609
610      Now lets tell Nagios to allow the "guest" user some access to
611      information via the web interface. You can choose whatever you would
612      like, but what is pretty typical is this:
613
614      authorized_for_system_information=nagiosadmin,guest
615      authorized_for_configuration_information=nagiosadmin,guest
616      authorized_for_system_commands=nagiosadmin
617      authorized_for_all_services=nagiosadmin,guest
618      authorized_for_all_hosts=nagiosadmin,guest
619      authorized_for_all_service_commands=nagiosadmin
620      authorized_for_all_host_commands=nagiosadmin
621
622    - Once you make the changes, save the file cgi.cfg, verify your
623      work and restart Nagios.
624
625    - To see if you can log in as the "guest" user you may need to clear
626      the cookies in your web browser. You will not notice any difference
627      in the web interface. The difference is that a number of items that
628      are available via the web interface (forcing a service/host check,
629      scheduling checks, comments, etc.) will not work for the guest
630      user.
631
632<b>UPCOMING</b>
633New Commands, Updating Contact Information, Connecting Nagios to RT (tickets)
634-----------------------------------------------------------------------------
635
636During the ticket management sessions later int he week we will be working on
637these items to allow Nagios to automatically create tickets in RT when certain
638events take place.
639</font>
640</pre>
641
642<font size="1">
643Last update 24 Feb 2010 by HA
644</font>
645
646</body>
647</html>