Agenda: exercises-nagios-IX-optional.page

File exercises-nagios-IX-optional.page, 16.0 KB (added by admin, 5 years ago)
Line 
1% Nagios Installation and Configuration
2%
3
4# Introduction
5
6## Goals
7
8* Optional exercises for Nagios
9
10## Notes
11
12* Commands preceded with "$" imply that you should execute the command as
13  a general user - not as root.
14* Commands preceded with "#" imply that you should be working as root.
15* Commands with more specific command lines (e.g. "rtrX>" or "mysql>")
16  imply that you are executing commands on remote equipment, or within
17  another program.
18
19# Exercises
20
21
22
23# PART IX - Optional Exercises
24
25## 1. Check that nagios is Running
26
27As opposed to just checking that a web server is running on the classroom PCs,
28you could also check that t
29he nagios3 service is available, by requesting the
30/nagios3/ path. This means passing extra options to the check_http plugin.
31
32For a description of the available options, type this:
33
34~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
35# /usr/lib/nagios/plugins/check_http                                    (short help)
36# /usr/lib/nagios/plugins/check_http --help                     (detailed help)
37~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38
39and of course you can browse the online nagios documentation or google
40for information on check_http. You can even run the plugin by hand to
41perform a one-shot service check:
42
43~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
44# /usr/lib/nagios/plugins/check_http -H localhost -u /nagios3/
45~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
46
47So the goal is to configure nagios to call check_http in this way.
48
49~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
50{hint, /etc/nagios-plugins/config/http.cfg)
51
52define command{
53        command_name    check_http_url
54        command_line    /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -u '$ARG1$'
55        }
56
57        (hint, /etc/nagios3/conf.d/services_nagios2.cfg_
58
59define service {
60        hostgroup_name                  nagios-servers
61        service_description             NAGIOS
62        check_command                   check_http_url!/nagios3/
63        use                             generic-service
64}
65~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
66
67and of course you'll need to create a hostgroup called nagios-servers to
68link to this service check. (hint, /etc/nagios3/conf.d/hostgroups_nagios2.cfg)
69
70Once you have done this, check that Nagios warns you about failing
71authentication (because it's trying to fetch the page without providing
72the username/password). There's an extra parameter you can pass to
73check_http to provide that info, so we need to define a new command
74with an additional argument:
75
76~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
77define command{
78        command_name    check_http_url_auth
79        command_line    /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -u '$ARG1$' -a '$ARG2$'
80        }
81~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
82
83And you invoke it:
84
85~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
86check_command                   check_http_url_auth!/nagios3/!nagiosadmin:password
87~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
88
89WARNING: in the tradition of "Debian Knows Best", their definition of the
90check_http command in /etc/nagios-plugins/config/http.cfg
91is *not* the same as that recommended in the nagios3 documentation.
92It is missing $ARG1$, so any parameters to pass to check_http are
93ignored. So you might think you are monitoring /nagios3/ but actually
94you are monitoring root!
95
96This is why we had to make a new command definition "check_http_url".
97You could make a more specific one like "check_nagios", or you could
98modify the Ubuntu check_http definition to fit the standard usage.
99
100
101## 2. Check that SNMP is running on the classroom NOC
102
103
104This exercise will not work if you did not complete the installation of additional
105SNMP MIBs at the start of the week and configure /etc/snmp/snmp.conf properly. Please refer to the original snmp exercises if you are unsure.
106 
107First you will need to add in the appropriate service check for SNMP in the file
108/etc/nagios3/conf.d/services_nagios2.cfg. This is where Nagios is impressive. There
109are hundreds, if not thousands, of service checks available via the various Nagios
110sites on the web. You can see what plugins are installed by Ubuntu in the nagios3
111package that we've installed by looking in the following directory:
112
113~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114# ls /usr/lib/nagios/plugins
115~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
116
117As you'll see there is already a check_snmp plugin available to us. If you are
118interested in the options the plugin takes you can execute the plugin from the
119command line by typing:
120
121~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122# /usr/lib/nagios/plugins/check_snmp                                    (short help)
123# /usr/lib/nagios/plugins/check_snmp --help                             (detailed help)
124~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125
126to see what options are available, etc. You can use the check_snmp plugin and
127Nagios to create very complex or specific system checks.
128
129Now to see all the various service/host checks that have been created using the
130check_snmp plugin you can look in /etc/nagios-plugins/config/snmp.cfg. You will
131see that there are a lot of preconfigured checks using snmp, including:
132
133~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134      snmp_load
135      snmp_cpustats
136      snmp_procname
137      snmp_disk
138      snmp_mem
139      snmp_swap
140      snmp_procs
141      snmp_users
142      snmp_mem2
143      snmp_swap2
144      snmp_mem3
145      snmp_swap3
146      snmp_disk2
147      snmp_tcpopen
148      snmp_tcpstats
149      snmp_bgpstate
150      check_netapp_uptime
151      check_netapp_cupuload
152      check_netapp_numdisks
153      check_compaq_thermalCondition
154~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
155     
156And, even better, you can create additional service checks quite easily.
157For the case of verifying that snmpd (the SNMP service on Linux) is running we
158need to ask SNMP a question. If we don't get an answer, then Nagios can assume
159that the SNMP service is down on that host. When you use service checks such as
160check_http, check_ssh and check_telnet this is what they are doing as well.
161
162In our case, let's create a new service check and call it "check_system". This
163service check will connect with the specified host, use the private community
164string we have defined in class and ask a question of snmp on that host - in this
165case we'll ask about the System Description, or the OID "sysDescr.0" -
166
167To do this start by editing the file /etc/nagios-plugins/config/snmp.cfg:
168
169~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
170# editor /etc/nagios-plugins/config/snmp.cfg
171~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
172
173At the top (or the bottom, your choice) add the following entry to the file:
174
175~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
176# 'check_system' command definition
177define command{
178       command_name    check_system
179       command_line    /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -C '$ARG1$' -o sysDescr.0
180        }
181~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182     
183COPY and PASTE this. Do not type this by hand and make sure that the command_line line
184does not wrap.
185
186Note that "command_line" is a single line. If you copy and paste in
187your editor, the line may not wrap properly and you may have to manually
188"join" the two lines so they are one.
189         
190Now you need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg and add
191in this service check. We'll run this check against all our servers in the
192classroom, or the hostgroup "debian-servers"
193
194Edit the file /etc/nagios3/conf.d/services_nagios2.cfg
195
196~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
197# editor /etc/nagios3/conf.d/services_nagios2.cfg
198~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
199
200At the bottom of the file add the following definition:
201
202~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
203# check that snmp is up on all servers
204define service {
205        hostgroup_name                  snmp-servers
206        service_description             SNMP
207        check_command                   check_system!xxxxxx
208        use                             generic-service
209        notification_interval           0 ; set > 0 if you want to be renotified
210}
211~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
212
213The "xxxxxx" is the community string previously (or to be) defined in class.
214     
215Note that we have included our own community string here vs. hard-coding
216it in the snmp.cfg file earlier. You must change the "xxxxx" to be the snmp
217community string given in class or this check will not work.
218     
219Now we must create the "snmp-servers" group in our hostgroups_nagios2.cfg file.
220Edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg and go to the end of the
221file. Add in the following hostgroup definition:
222
223~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~     
224# A list of snmp-enabled devices on which we wish to run the snmp service check
225define hostgroup {
226           hostgroup_name       snmp-servers
227                   alias        snmp servers
228                   members      noc,localhost,pc1,pc2,pc3,pc4...pc36,rtr1,rtr2,rtr3...rtr9
229          }
230~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
231
232Note that for "members" you can add in all PCs and routers as they should all
233have snmp up and running at this time. Remember to EXCLUDE our pc and use
234localhost instead.
235       
236Now verify that your changes are correct and restart Nagios.
237   
238~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
239# service nagios3 restart
240~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
241     
242**** Defect / Bug in Ubuntu 12.04 LTS ***
243
244The net-snmp 5.6.x package appears to not install one of the IANA mibs (IANAifType-MIB).
245This causes a MIB error, which, in turn causes the snmp check plugin to fail. To fix
246this problem do the following (as root):
247
248~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
249# cd /usr/share/mibs
250# wget http://www.iana.org/assignments/ianaiftype-mib/ianaiftype-mib
251# mv ianaiftype-mib ianaiftype-mib.my
252~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
253
254And, now you can continue.
255
256If you click on the Service Detail menu choice in web interface you should see
257the SNMP check appear for the noc host, or for any other hosts you may have
258included on the "members" line above.
259     
260
261## 3. Check other settings using SNMP
262
263The real purpose for check_snmp is to poll devices for their status. It can
264be used, for example, to check that power supplies and fans are functioning
265normally.
266
267In order to do this, you will need to find the OID(s) of interest and the
268values which you want to be alerted on for warning and critical status.
269
270The following example checks the power supply status of a Netgear 72xx
271series switch with dual power supplies running 8.x firmware.  Nagios doesn't
272care which file each definition goes in, but some locations are suggested.
273
274~~~
275# This could go in your switches.cfg or in services_nagios2.cfg
276
277define service {
278        hostgroup_name                  netgear72xx-8x-switches
279        service_description             PSUs
280        check_command                   check_netgear72xx_8x_power_dual!<community>
281        use                             generic-service
282}
283
284# This could go in /etc/nagios-plugins/config/netgear-8x.cfg
285
286define command{
287        command_name    check_netgear72xx_8x_power_dual
288        command_line     /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' \
289          -o .1.3.6.1.4.1.4526.10.43.1.7.1.3.0,.1.3.6.1.4.1.4526.10.43.1.7.1.3.1 \
290          -C '$ARG1$' -u 'PSU1,PSU2' -w @5:5,@5:5 -c @2:2,@2:2 -l "PSU status "
291}
292~~~
293
294You'd also create a hostgroup "netgear72xx-8x-switches" and make the switches
295members of this group, so that Nagios runs this check on those devices.
296
297Notice that the `-o` option contains the two OIDs we want to poll, and the
298`-w` and `-c` options give the values to check for.  This makes use of a
299feature of check_snmp that is not well documented:
300
301* `-w <x>:<y>` gives a warning if the value is *not* between x and y
302* `-w @<x>:<y>` gives a warning if the value *is* between x and y
303
304The [MIB](http://www.downloads.netgear.com/files/GDC/GSM7224V2/gsm72xxv2-8.0.1.29-mibs.tar.bz2)
305(fastpath_boxservices.my) contains the following definitions:
306
307~~~
308    boxServicesPowSupplyItemState OBJECT-TYPE
309             SYNTAX      INTEGER {
310                                  operational(1),
311                                  failed(2),
312                                  powering(3),
313                                  notpowering(4),
314                                  notpresent(5)
315                                 }
316         MAX-ACCESS  read-only
317         STATUS      current
318         DESCRIPTION
319                     "The status of power supply"
320         ::= { boxServicesPowSuppliesEntry 3 }
321~~~
322
323Therefore, we get a warning if the status is `notpresent(5)`, and a critical
324error if the status is `failed(2)`.
325
326Note: `notpowering(4)` means that the PSU is good but the device is being
327powered by the other PSU.  This is not an error.
328
329You should be able to adapt this recipe to other types of equipment, and for
330checking fan status and temperature, by adjusting the OIDs and values
331appropriately.
332
333The OID .1.3.6.1.4.1.4526.10.43.1.7.1.3 comes from:
334
335~~~
336netgear                 OBJECT IDENTIFIER ::= { enterprises 4526 }
337
338ng7000managedswitch     OBJECT IDENTIFIER ::= { netgear 10 }
339
340    fastPathBoxServices MODULE-IDENTITY
341           LAST-UPDATED "200802220000Z" -- 22 Feb 2008 12:00:00 GMT
342           ORGANIZATION "Netgear"
343           CONTACT-INFO
344           ""...
345      ::= { ng7000managedswitch 43 }
346
347    boxServicesGroup    OBJECT IDENTIFIER ::= { fastPathBoxServices 1 }
348
349    boxServicesPowSuppliesTable OBJECT-TYPE
350         SYNTAX SEQUENCE OF BoxServicesPowSuppliesEntry
351         MAX-ACCESS  not-accessible
352         STATUS      current
353         DESCRIPTION
354                     "Power supply"
355         ::= { boxServicesGroup 7 }
356
357    boxServicesPowSuppliesEntry OBJECT-TYPE
358         SYNTAX      BoxServicesPowSuppliesEntry
359         MAX-ACCESS  not-accessible
360         STATUS      current
361         DESCRIPTION
362                     "Box Services Power Supply Entry"
363         INDEX { boxServicesPowSupplyIndex }
364         ::= { boxServicesPowSuppliesTable 1 }
365
366    BoxServicesPowSuppliesEntry ::= SEQUENCE {
367          boxServicesPowSupplyIndex
368              Integer32,
369          boxServicesPowSupplyItemType
370              INTEGER,
371          boxServicesPowSupplyItemState
372              INTEGER
373          }
374
375    boxServicesPowSupplyIndex OBJECT-TYPE
376         SYNTAX      Integer32 (0..2147483647)
377         MAX-ACCESS  read-only
378         STATUS      current
379         DESCRIPTION
380                     "Unique index of power supply table entry"
381         ::= { boxServicesPowSuppliesEntry 1 }
382~~~
383
384A device with only one power supply connected reports under
385`FASTPATH-BOXSERVICES-PRIVATE-MIB::boxServicesPowSuppliesTable`:
386
387~~~
388.1.3.6.1.4.1.4526.10.43.1.7.1.1.0 = INTEGER: 0
389.1.3.6.1.4.1.4526.10.43.1.7.1.1.1 = INTEGER: 1
390.1.3.6.1.4.1.4526.10.43.1.7.1.2.0 = INTEGER: fixed(1)
391.1.3.6.1.4.1.4526.10.43.1.7.1.2.1 = INTEGER: removable(2)
392.1.3.6.1.4.1.4526.10.43.1.7.1.3.0 = INTEGER: operational(1)
393.1.3.6.1.4.1.4526.10.43.1.7.1.3.1 = INTEGER: notpresent(5)
394~~~
395
396or with translation of OIDs:
397
398~~~
399FASTPATH-BOXSERVICES-PRIVATE-MIB::boxServicesPowSupplyIndex.0 = INTEGER: 0
400FASTPATH-BOXSERVICES-PRIVATE-MIB::boxServicesPowSupplyIndex.1 = INTEGER: 1
401FASTPATH-BOXSERVICES-PRIVATE-MIB::boxServicesPowSupplyItemType.0 = INTEGER: fixed(1)
402FASTPATH-BOXSERVICES-PRIVATE-MIB::boxServicesPowSupplyItemType.1 = INTEGER: removable(2)
403FASTPATH-BOXSERVICES-PRIVATE-MIB::boxServicesPowSupplyItemState.0 = INTEGER: operational(1)
404FASTPATH-BOXSERVICES-PRIVATE-MIB::boxServicesPowSupplyItemState.1 = INTEGER: notpresent(5)
405~~~
406