Agenda: exercises-nagios.txt

File exercises-nagios.txt, 15.0 KB (added by admin, 7 years ago)
Line 
1
2Nagios Installation and Configuration
3
4Notes:
5------
6* Commands preceded with "$" imply that you should execute the command as
7  a general user - not as root.
8* Commands preceded with "#" imply that you should be working as root.
9* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
10  imply that you are executing commands on remote equipment, or within
11  another program.
12
13Exercises
14---------
15
16Exercises Part I
17----------------
18
190. Log in to your PC or open a terminal window as the sysadm user.
20
211. You may need to install Nagios version 3. You would do this as root or as the sysadmin
22   user and use the "sudo" command. As sysadm:
23
24   $ sudo apt-get install nagios3
25
26   Unless you already have an MTA installed, nagios3 will install
27   postfix as a dependency. Select "Internet Site" option. (If you had wanted
28   to use a different MTA likely you'd install it before nagios3)
29
30   You will be prompted for nagiosadmin password. Give it the normal
31   workshop password.
32
33   To get the documentation in /usr/share/doc/nagios3-doc/html/ (which
34   can also be read via the nagios web interface), do:
35
36    $ sudo apt-get install nagios3-doc
37
38
392. Look at the file which contains the password. It's hashed (encrypted)
40
41    $ cat /etc/nagios3/htpasswd.users
42
43
443. You should already have a working Nagios!
45
46    - Open a browser, and go to your machine like this:
47
48    http://pcN.ws.nsrc.org/nagios3/
49
50    - At the login prompt, login as:
51
52        user: nagiosadmin
53        pass: <CLASS PASSWORD>
54
55    Browse to the "Host Detail" page to see what's already configured.
56
57
584. Let's look at the configuration layout... But, first, let's become the root
59   user on your machine:
60
61    $ sudo bash
62
63    # cd /etc/nagios3
64    # ls -l
65
66    -rw-r--r-- 1 root root    1882 2008-12-18 13:42 apache2.conf
67    -rw-r--r-- 1 root root   10524 2008-12-18 13:44 cgi.cfg
68    -rw-r--r-- 1 root root    2429 2008-12-18 13:44 commands.cfg
69    drwxr-xr-x 2 root root    4096 2009-02-14 12:33 conf.d
70    -rw-r--r-- 1 root root      26 2009-02-14 12:36 htpasswd.users
71    -rw-r--r-- 1 root root   42539 2008-12-18 13:44 nagios.cfg
72    -rw-r----- 1 root nagios  1293 2008-12-18 13:42 resource.cfg
73    drwxr-xr-x 2 root root    4096 2009-02-14 12:32 stylesheets
74
75    # cd conf.d
76    # ls -l   
77
78    -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
79    -rw-r--r-- 1 root root  418 2008-12-18 13:42 extinfo_nagios2.cfg
80    -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
81    -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
82    -rw-r--r-- 1 root root  210 2009-02-14 12:33 host-gateway_nagios3.cfg
83    -rw-r--r-- 1 root root  976 2008-12-18 13:42 hostgroups_nagios2.cfg
84    -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
85    -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
86    -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg
87
88    Notice that the package installs files with "nagios2" in their name.
89    This is because they are the same files as were used for the Nagios
90    version 2 Debian package. However there was a change made to the
91    host-gateway configuration file, so this has a new name.
92
93
945. You have a config which is already monitoring your own system
95(localhost_nagios2.cfg) and your upstream default gateway
96(host-gateway_nagios3.cfg).
97
98Have a look at the config file for the default gateway: it's very simple.
99(Note: tab completion is useful here. Type cat host-g then hit tab; the
100filename will be filled in for you)
101
102    # cat host-gateway_nagios3.cfg
103
104    # a host definition for the gateway of the default route
105    define host {
106            host_name   gateway
107            alias       Default Gateway
108            address     10.10.0.254
109            use         generic-host
110            }
111
112
113
114PART II
115Configuring Equipment
116-----------------------------------------------------------------------------
117
1180. Order of configuration
119
120Conceptually we will build our configuration files from the "nearest" device
121then the further away ones.
122
123By going in this order you will have defined the devices that act as parents
124for other devices.
125
126Remember to refer to the Network Diagram for our classroom if you get confused.
127
128We have the following instances:
129
130rtr     (the gateway router: 10.10.0.254)
131sw      (the gateway switch: 10.10.0.253, parent: rtr)
132
133rtr1    (group 1 router: 10.10.0.201, parent: sw)
134rtr2    (group 2 router: 10.10.0.202, parent: sw)
135rtr3    (group 3 router: 10.10.0.203, parent: sw)
136rtr4    (group 4 router: 10.10.0.204, parent: sw)
137rtr5    (group 5 router: 10.10.0.205, parent: sw)
138rtr6    (group 6 router: 10.10.0.206, parent: sw)
139rtr7    (group 7 router: 10.10.0.207, parent: sw)
140rtr8    (group 8 router: 10.10.0.208, parent: sw)
141rtr9    (group 9 router: 10.10.0.209, parent: sw)
142rtr10   (group 10 router: 10.10.0.210, parent: sw)
143
144pc1     (10.10.0.1, parent: sw)
145pc2     (10.10.0.2, parent: sw)
146...
147pc29 (10.10.0.29, parent: sw)
148pc30 (10.10.0.30, parent: sw)
149..
150pc40 (10.10.0.40, parent: sw)
151
152s1      (10.10.0.241, parent: sw)
153s2      (10.10.0.242, parent: sw)
154s3      (10.10.0.243, parent: sw)
155s4      (10.10.0.244, parent: sw)
156
157noc     (10.10.0.250, parent: sw)
158
159ap1     (10.10.0.251, parent: sw)       
160ap2     (10.10.0.252, parent: sw)
161
162We recommend grouping these items in the files:
163
164routers.cfg             (rtr, rtr1...rtr5)
165switches.cfg    (sw)
166pcs.cfg                 (pc1...pc30, s1, s2, noc, ap1, ap2)
167
168
1691. First we need to tell Nagios to monitor the gateway router for
170   our classroom which is 10.10.0.254:
171
172   # cd /etc/nagios3/conf.d/
173
174Create the routers gateway like this:
175
176   # editor routers.cfg
177
178define host {
179    use         generic-host
180    host_name   rtr
181    alias       Gateway Router
182    address     10.10.0.254
183}
184
185In the same file create the 5 entries for the group routers:
186
187define host {
188    use         generic-host
189    host_name   rtrX
190    alias       Group 1 Router
191    address     10.10.0.20X
192    parents     sw
193}
194
195... and replace 'X' in the definition above with the router number (1 - 5)
196
197repeat this for rtr2, rtr3, rtr4 ... up to rtr10
198
199Note that the entry for "sw" our gateway switch has not yet been created. That
200is next.
201
202Exit and save this file.
203
204
2052. Create a file called switches.cfg and add an entry for this item:
206
207   # editor switches.cfg
208
209define host {
210    use         generic-host
211    host_name   sw
212    alias       Backbone Switch
213    address     10.10.0.253
214    parents     rtr
215}
216
217At this point Nagios is configured to monitor whether our core hosts (the
218parents) are up on our classroom network. Your next steps are to add in the
219individual hosts such as the classroom virtual PC images on your table (for
220example for group 1, pc1 - 6, for group 2, pc7 - 12, etc.), the Wireless
221Access Points (ap1 and ap2), the servers s1, s2 and the noc:
222
223Be sure you add in a proper "parents" entry for each host.
224
225Remember, if you don't understand the parent relations in our network you can
226review the logical network diagram on the wiki!
227
228Note the Nagios parent bullet points in the slides!
229
230                "Nagios Parent Relationships"
231
232
233STEPS 2a - 2c SHOULD BE REPEATED WHENEVER YOU UPDATE THE CONFIGURATION!
234 
235
2362a. Verify that your configuration files are OK:
237
238    # nagios3 -v /etc/nagios3/nagios.cfg
239
240    ... You should get some warnings like :
241Warning: Host 'rtr' has no services associated with it!
242Warning: Host 'sw' has no services associated with it!
243etc....
244...
245Total Warnings: N
246Total Errors:   0
247
248Things look okay - No serious problems were detected during the check.
249Nagios is saying that it's unusual to monitor a device just for its
250existence on the network, without also monitoring some service.
251
252
2532b. Reload/Restart Nagios
254
255    # /etc/init.d/nagios3 stop
256    # /etc/init.d/nagios3 start
257
258        or
259
260        # service nagios3 restart
261
2622c. Go to the web interface (http://pcN.ws.nsrc.org/nagios3) and check that
263   the hosts you just added are now visible in the interface. Click on the
264   "Host Detail" item on the left of the Nagios screen to see this. You may
265   see it in "PENDING" status until the check is carried out.
266
267
268HINT: You will be doing this a lot. If you do it all on one line, like this,
269then you can hit cursor-up and rerun all in one go:
270
271    nagios3 -v /etc/nagios3/nagios.cfg && service nagios3 restart
272
273The '&&' ensures that the restart only happens if the config is valid.
274
275
2763. Create entries for the classroom PCs
277
278Now that we have our routers and switches defined it is quite easy to create
279entries for all our PCs.  Think about the parent relationships:
280
281Remember, if you do not understand the parent relationship refer back to the
282classroom network diagram !
283
284Below are three sample entries. One for the NOC, one for pc1 and one for
285pc6.  You should be able to use this example to create entries for all
286classroom PCs plus the NOC.
287
288We could put these entries in to separate files, but as our network is small
289we'll use a single file called pcs.cfg.
290
291NOTE! You do not add in an entry for your own PC or router. This has already
292been defined in the file /etc/nagios3/conf.d/localhost_nagios2.cfg.  This
293definition is what defines the Nagios network viewpoint. So, when you come to
294the spot where you might add an entry for your PC you should skip this and go
295on to the next PC in the list.
296
297        # editor pcs.cfg
298       
299# Our classroom NOC
300
301define host {
302    use         generic-host
303    host_name   noc
304    alias       Workshop NOC machine
305    address     10.10.0.250
306    parents     sw
307}
308
309# PCs
310
311define host {
312    use         generic-host
313    host_name   pc1
314    alias       pc1
315    address     10.10.0.1
316    parents     sw
317}
318
319define host {
320    use         generic-host
321    host_name   pc6
322    alias       pc6
323    address     10.10.0.6
324    parents     sw
325}
326
327Pay attention to the parent entries and the IP addresses.
328
329Take the three entries above and now expand this to create the remaining
330entries for the PCs in your group. That is, if you are in group 1, fill in
331for PCs 2 through 5 (rememember to SKIP your own PC!).
332
333
334Exit and save the file pcs.cfg
335
336As before, repeat steps 2a-2c to verify your configuration, correct any
337errors, and activate it.
338
3395. Look at your Nagios instance on the web. Note that "Status Map" gives
340you a graphical view of the parent-child relationships you have defined.
341
342
343PART III
344Configure Service check for the classroom NOC
345-----------------------------------------------------------------------------
346
3470. Configuring
348
349Now that we have our hardware configured we can start telling Nagios what services to monitor
350on the configured hardware, how to group the hardware in interesting ways, how to group
351services, etc.
352
3531. Associate a service check for our classroom NOC
354
355    # editor hostgroups_nagios2.cfg
356
357    - Find the hostgroup named "ssh-servers". In the members section of the defintion
358      change the line:
359
360members                 localhost
361
362    to
363
364members                 localhost,noc
365
366Exit and save the file.
367
368Verify that your changes are OK:
369
370        # nagios3 -v /etc/nagios3/nagios.cfg
371       
372Restart Nagios to see the new service assocation with your host:
373
374        # /etc/init.d/nagios3 restart
375
376Click on the "Service Detail" link in the Nagios web interface to see your new entry.
377
378
379PART IV
380Defining Services for all PCs
381-----------------------------------------------------------------------------
382
3830. For services, the default normal_check_interval is 5 (minutes) in
384   generic-service_nagios2.cfg. You may wish to change this to 1 to speed up
385   how quickly service issues are detected, at least in the workshop.
386
3871. Determine what services to define for what devices
388
389   - This is core to how you use Nagios and network monitoring tools in
390     general. So far we are simply using ping to verify that physical hosts
391     are up on our network and we have started monitoring a single service on
392     a single host (your PC). The next step is to decide what services you wish
393     to monitor for each host in the classroom.
394
395   - In this particular class we have:
396
397     routers:  running ssh and snmp
398     switches: running telnet and possibly ssh as well as snmp
399     pcs:      All PCs are running ssh and http and should be running snmp
400               The NOC is currently running an snmp daemon
401             
402     So, let's configure Nagios to check for these services for these
403     devices.
404
4052.) Verify that SSH is running on the routers and workshop PCs images
406
407   - In the file services_nagios2.cfg there is already an entry for the SSH
408     service check, so you do not need to create this step. Instead, you
409     simply need to re-define the "ssh-servers" entry in the file
410     /etc/nagios3/conf.d/hostgroups_nagios2.cfg. The initial entry in the file
411     looked like:
412
413# A list of your ssh-accessible servers
414define hostgroup {
415        hostgroup_name  ssh-servers
416                alias           SSH servers
417                members         localhost,noc
418        }
419
420     What do you think you should change?
421         
422         Correct, the "members" line.
423         
424         You should add in entries for all the classroom pcs, routers and the
425         switches that run ssh.  With this information and the network diagram
426         you should be able complete this entry.
427     
428     The entry will look something like this:
429
430define hostgroup {
431        hostgroup_name  ssh-servers
432                alias           SSH servers
433                members         localhost,pc1,pc2,pc3,...,pc6,ap1,ap2,s1,s2,noc
434        }
435
436         Note: leave in "localhost" - This is your PC and represents Nagios'
437         network point of
438         view. So, for instance, if you are on "pc3" you would not include
439         "pc3" in the list
440         of all the classroom pcs as it is represented by the "localhost"
441         entry.
442         
443         The "members" entry will be a long line and will likely wrap on the screen.
444
445         Remember to include all the PCs on your table and the routers that
446                 you have defined. Do not include any entries if they are not
447                 already defined in pcs.cfg, switches.cfg or routers.cfg.
448
449    - Once you are done, run the pre-flight check:
450
451    # nagios3 -v /etc/nagios3/nagios.cfg
452
453    If everything looks good, then restart Nagios
454
455    # /etc/init.d/nagios3 stop
456    # /etc/init.d/nagios3 start
457
458    and view your changes in the Nagios web interface.
459
460To continue with hostgroups you can add additional groups for later use,
461such as all our virtual servers.
462
463Go ahead and edit the file hostgroups_nagios2.cfg again:
464
465     # editor hostgroups_nagios2.cfg
466
467and add the following to the end of the file:
468
469# A list of our virtual routers
470define hostgroup {
471        hostgroup_name  cisco7200
472                alias        Cisco 7200 Routers
473                members      rtr1,rtr2,rtr3,rtr4,rtr5,rtr6,rtr7,rtr8,rtr9,rtr10
474        }
475
476Save and exit from the file. Verify that everything is OK:
477
478    # nagios3 -v /etc/nagios3/nagios.cfg
479
480    If everything looks good, then restart Nagios
481
482    # /etc/init.d/nagios3 stop
483    # /etc/init.d/nagios3 start
484
4853.) Check that http is running on all the classroom PCs.
486
487    - This is almost identical to the previous exercise. Just make the change
488          to the HTTP service adding in each PC (no routers or switches).
489          Remember, you don't need to add your machine as it is already defined
490          as "localhost".     
491
492          Find the definition in hostgroups_nagios2.cfg:
493
494                define hostgroup {
495                hostgroup_name  http-servers
496                alias           HTTP servers
497                members         localhost
498                }
499
500          and after localhost, add all the PCs in your group.
501