Track1Agenda: exercises-nagios-I-III-basic.txt

File exercises-nagios-I-III-basic.txt, 11.7 KB (added by b.candler, 6 years ago)
Line 
1Nagios Installation and Configuration
2
3Notes:
4------
5* Commands preceded with "$" imply that you should execute the command as
6  a general user - not as root.
7* Commands preceded with "#" imply that you should be working as root.
8* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
9  imply that you are executing commands on remote equipment, or within
10  another program.
11
12Exercises
13---------
14
15PART I
16----------------
17
181. Log in to your virtual machine as the sysadm user.
19
202. Install Nagios Version 3
21---------------------------
22
23        $ sudo apt-get install nagios3 nagios3-doc
24
25During installation you will be prompted for the "Nagios web administration password:" - This
26will be for the Nagios user "nagiosadmin". When prompted enter in the password you are using
27your sysadm account.
28
29Note: if you have not already done so, you may be asked to configure
30the Postfix Mail Transport Agent during the Nagios installation process.
31Just accept the default "Internet Site".
32
333. See Initial Nagios Configuration
34------------------------------------
35
36Open a browser, and go to your machine like this:
37
38        http://pcN.ws.nsrc.org/nagios3/
39
40At the login prompt, login as:
41
42        User Name: nagiosadmin
43        Password:  <CLASS PASSWORD>
44
45Click on the "Hosts" link on the left of the initial Nagios page to see what has
46already been configured.
47
484. Update the File hostgroups_nagios2.cfg
49-----------------------------------------
50
51        $ cd /etc/nagios3/conf.d
52        $ sudo editor hostgroups_nagios2.cfg
53
54Go to the bottom of the file and add the following entry (we STRONGLY encourage you
55to COPY and PASTE!):
56
57
58define hostgroup {
59        hostgroup_name  ping-servers
60                alias           Pingable servers
61                members         rtrX
62        }
63
64Where "rtrX" is the router for your group. That is, if you are in group 1, then
65replace "rtrX" with "rtr1". Now save and exit the from the file.
66
67
685. Add Routers, PCs and Switches
69--------------------------------
70
71We will create three files, routers.cfg, switches.cfg and pcs.cfg and make
72entries for the hardware in our classroom.
73
746a. Creating the switches.cfg file
75----------------------------------
76
77        $ cd /etc/nagios3/conf.d                                (just to be sure)
78        $ sudo editor switches.cfg
79
80In this file add the following entry (COPY and PASTE!):
81
82define host {
83    use         generic-host
84    host_name   sw
85    alias       Backbone Switch
86    address     10.10.0.253
87}
88
89Save the file and exit.
90
916b. Creating the "routers.cfg" file
92-----------------------------------
93
94We have up to 10 total routers. These are rtr1-rtr9 and gw-rtr. And, we have
951 or 2 wireless Access Points (ap1, ap2). We will define entries for some of
96these. If any of these devices do not exist in your workshop, then do not
97include them. Remember, COPY and PASTE!
98
99        $ sudo editor routers.cfg
100
101
102define host {
103    use         generic-host
104    host_name   gw-rtr
105    alias       Classrooom Gateway Router
106    address     10.10.0.254
107}
108
109define host {
110    use         generic-host
111    host_name   rtr1
112    alias       Group 1 Gateway Router
113    address     10.10.1.254
114}
115
116define host {
117    use         generic-host
118    host_name   rtr2
119    alias       Group 2 Gateway Router
120    address     10.10.2.254
121}
122
123# Note: you do not need to add definitions for all routers now = you can
124# always come back and add the rest later!
125
126define host {
127    use         generic-host
128    host_name   ap1
129    alias       Wireless Access Point 1
130    address     10.10.0.251
131}
132
133define host {
134    use         generic-host
135    host_name   ap2
136    alias       Wireless Access Point 2
137    address     10.10.0.252
138}
139
140
141Now save the file and exit the editor.
142
143
1446c. Creating the pcs.cfg File
145-----------------------------
146
147Now we will create entries for some of the Virtual Machines in our classroom
148Below we give you the first few entries. You should complete the file with as
149many PCs as you wish to add. We recommend that, at least, you add the 4 PCs
150that are members of your group as well as an entry for the classroom NOC, and
151at least one PC from another group (remember to COPY and PASTE!):
152
153        $ sudo editor pcs.cfg
154
155
156define host {
157    use         generic-host
158    host_name   noc
159    alias       Workshop NOC machine
160    address     10.10.0.250
161}
162
163#
164# Group 1
165#
166
167define host {
168    use         generic-host
169    host_name   pc1
170    alias       pc1
171    address     10.10.1.1
172}
173
174define host {
175    use         generic-host
176    host_name   pc2
177    alias       pc2
178    address     10.10.1.2
179}
180
181#
182# Another PC (example only!)
183#
184
185define host {
186    use         generic-host
187    host_name   pc20
188    alias       pc20
189    address     10.10.5.20
190}
191
192You can save and exit from the file now. You can add more PC entries later.
193
194
195STEPS 7a - 7c SHOULD BE REPEATED WHENEVER YOU UPDATE THE CONFIGURATION!
196=======================================================================
197   
1987a. Verify that your configuration files are OK
199-----------------------------------------------
200
201        $ sudo nagios3 -v /etc/nagios3/nagios.cfg
202
203
204    You will get some warnings like the ones below. You can ignore them for
205        now.
206
207Checking services...
208        Checked 7 services.
209Checking hosts...
210Warning: Host 'gw-rtr' has no services associated with it!
211Warning: Host 'rtr1' has no services associated with it!
212Warning: Host 'rtr2' has no services associated with it!
213
214etc....
215...
216Total Warnings: N
217Total Errors:   0
218
219Things look okay - No serious problems were detected during the check.
220Nagios is saying that it's unusual to monitor a device just for its
221existence on the network, without also monitoring some service.
222
223
2247b. Reload/Restart Nagios
225-------------------------
226
227        $ sudo service nagios3 restart
228
229HINT: You will be doing this a lot. If you do it all on one line, like this,
230then you can use arrow-up and call back the command:
231
232        $ sudo nagios3 -v /etc/nagios3/nagios.cfg && sudo /etc/init.d/nagios3 restart
233
234The '&&' ensures that the restart only happens if the config is valid.
235
236
2377c. Verify via the Web Interface
238--------------------------------
239
240Go to the web interface (http://pcN.ws.nsrc.org/nagios3) and check that the hosts
241you just added are now visible in the interface. Click on the "Hosts" item on the
242left of the Nagios screen to see this. You may see it in "PENDING" status until the
243check is carried out.
244
245
2468. View Status Map
247--------------------
248
249Go to http://pcN.ws.nsrc.org/nagios3
250
251Click on the "Map" item on the left. You should see all your hosts with the Nagios
252process in the middle. The "?" are because we have not told Nagios what type of host
253each items is (router, switch, AP, PC running Linux, etc...)
254
255
256
257PART II
258Configure Service check for the classroom NOC
259-----------------------------------------------------------------------------
260
2610. Configuring
262
263Now that we have our hardware configured we can start telling Nagios what services to monitor
264on the configured hardware, how to group the hardware in interesting ways, how to group
265services, etc.
266
2671. Associate a service check for our classroom NOC
268
269    $ sudo editor hostgroups_nagios2.cfg
270
271    - Find the hostgroup named "ssh-servers". In the members section of the defintion
272      change the line:
273
274members                 localhost
275
276    to
277
278members                 localhost,noc
279
280Exit and save the file.
281
282Verify that your changes are OK:
283
284        $ sudo nagios3 -v /etc/nagios3/nagios.cfg
285       
286Restart Nagios to see the new service assocation with your host:
287
288        $ sudo service nagios3 restart
289
290In the Nagios web interface, find the "Services" link (left menu), and click
291on it.
292
293You should be able to find your recent change:
294
295    noc  SSH      PENDING ...
296
297
298
299PART III
300Defining Services for all PCs
301-----------------------------------------------------------------------------
302
303Note: The default normal_check_interval is 5 (minutes) for checking services.
304   This is defined in "generic-service_nagios2.cfg". You may wish to change
305   this to 1 (1 minute) to speed up how quickly service issues are detected,
306   at least during this workshop.
307
3081. Determine what services to define for what devices
309
310   - This is a central concept in using Nagios and network monitoring tools
311     in general. So far we are simply using ping to verify that physical hosts
312     are up on our network and we have started monitoring a single service on
313     a single host (your PC). The next step is to decide what services (web
314         server, SSH, etc.) you wish to monitor for each host in the classroom.
315
316   - In this particular class we have:
317
318     routers:  running ssh and snmp
319     switches: running telnet and possibly ssh as well as snmp
320     pcs:      All PCs are running ssh and http and should be running snmp
321               The NOC is currently running an snmp daemon
322             
323     So, let's configure Nagios to check for these services on these
324     devices.
325
3262.) Verify that SSH is running on the routers and workshop PCs images
327
328   - In the file "services_nagios2.cfg" there is already an entry for the SSH
329     service check, so you do not need to create this step. Instead, you
330     simply need to re-define the "ssh-servers" entry in the file
331     /etc/nagios3/conf.d/hostgroups_nagios2.cfg. The initial entry in the file
332     looked like:
333
334# A list of your ssh-accessible servers
335define hostgroup {
336        hostgroup_name  ssh-servers
337                alias           SSH servers
338                members         localhost
339        }
340
341     What do you think you should change? Correct, the "members" line. You
342     should add in entries for all the classroom pcs, routers and  the
343     switches that run ssh.  With this information and the network diagram
344         you should be able complete this entry.
345
346     The entry will look something like this:
347
348define hostgroup {
349        hostgroup_name  ssh-servers
350                alias           SSH servers
351                members         localhost,pc1,pc2,...,ap1,noc,rtr1,rtr2,...,gw-rtr
352        }
353
354         Note: do not remove "localhost" - This is your PC and represents
355         Nagios' network point of view. So, for instance, if you are on "pc3"
356         you would NOT list "pc3" in the list of all the classroom pcs as
357         it is represented by the "localhost" entry.
358
359         The "members" entry will be a long line and will likely wrap on the
360         screen. If you want to start additional entries on  newline then use
361         "\" to indicate a newline like this:
362
363     Remember to include all the PCs and routers that you have defined in
364         the files "pcs.cfg", "switches.cfg" and "routers.cfg". Only add entries
365         from these files (i.e.: don't add "pc8" in your hostgroup list if "pc8"
366         isn't defined in "pcs.cfg" as well).
367
368    - Once you are done, run the pre-flight check and restart Nagios:
369
370        $ sudo nagios3 -v /etc/nagios3/nagios.cfg && sudo /etc/init.d/nagios3 restart
371
372    ... and view your changes in the Nagios web interface.
373
374To continue with hostgroups you can add additional groups for later use, such as all our virtual
375routers. Go ahead and edit the file hostgroups_nagios2.cfg again:
376
377     $ sudo editor hostgroups_nagios2.cfg
378
379and add the following to the end of the file (COPY and PASTE this):
380
381# A list of our virtual routers
382
383define hostgroup {
384        hostgroup_name  routers
385                alias           Cisco 7200 Routers
386                members         rtr1,rtr2,...
387        }
388
389Only list the routers you have defined in the "routers.cfg".
390
391Save and exit from the file. Verify that everything is OK:
392
393    $ sudo nagios3 -v /etc/nagios3/nagios.cfg
394
395    If everything looks good, then restart Nagios
396
397    $ sudo service nagios3 restart
398
3993.) Check that http is running on all the classroom PCs.
400
401    - This is almost identical to the previous exercise. Just make the change
402          to the HTTP service adding in each PC (no routers or switches). Remember,
403          you don't need to add your machine as it is already defined as
404          "localhost". Look for this hostgroup in the file hostgroups_nagios2.cfg
405          and update the "members" line appropriately.
406     
407      If you have questions or are confused please ask an instructor for help.