Agenda: exercises-tickets-cacti-nagios-smokeping.txt

File exercises-tickets-cacti-nagios-smokeping.txt, 15.1 KB (added by hervey, 8 years ago)
Line 
1Network Monitoring and Management
2
3Cacti, Nagios and Smokeping Ticket Creation with Request Tracker
4----------------------------------------------------------------
5
6Notes:
7------
8* Commands preceded with "$" imply that you should execute the command as
9  a general user - not as root.
10* Commands preceded with "#" imply that you should be working as root.
11* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
12  imply that you are executing commands on remote equipment, or within
13  another program.
14
15Exercises
16---------
17
18At this point in the week you should have Cacti, Nagios and Smokeping
19installed on your PCs. These exercises show you how to set up each
20of these programs to send alerts to the RT (Request Tracker) ticketing
21system to generate tickets.
22
23
24Exercises Part I
25----------------
26
270. Log in to your PC or open a terminal window as the sysadm user.
28
291. Verify that you have configured rt-mailgate to work with your MTA
30---------------------------------------------------------------------
31
32Open the file /etc/aliases:
33
34        $ sudo editor /etc/aliases
35
36In the file /etc/aliases you should have the following two lines:
37
38net-comment: "|/usr/bin/rt-mailgate --queue net --action comment --url http://localhost/rt/"
39net:        "|/usr/bin/rt-mailgate --queue net --action correspond --url http://localhost/rt/"
40
41If these lines are not in /etc/aliases, then be sure to add them. When you are done save
42the file and exit. Then you need to tell the MTA (Mail Transfer Agent) that there are some
43new aliases to be used:
44
45        $ sudo newaliases
46       
47       
482. Configure Smokeping
49----------------------
50
51In the file:
52
53        /etc/smokeping/config.d/Alerts
54       
55You can tell Smokeping where alert outputs should go. Edit the file:
56
57        $ sudo vi /etc/smokeping/config.d/Alerts
58       
59And Update the top of the file to be:
60
61        *** Alerts ***
62        to = net@localhost
63        from = smokealert@localhost
64
65    At the end of the file, add another alert like this:
66
67        +anydelay
68        type = rtt
69        # in milliseconds
70        pattern = >1
71        comment = Just for testing
72
73Be sure that all text is flush left in the file.
74
75Now exit and save the file.
76
77Notice the pattern in this alert. It means that an alert will be triggered
78as soon as a sample measurement has "ANY" delay, that is, more than one
79millisecond. This is just for testing. In reality, you will want to create
80an alert based on your observed baseline. For example, if your DNS servers'
81delay suddendly goes from under 10 ms to over 100ms.
82
83Next, be sure you have this test alert defined for some of your Targets.
84You can either turn on alerts by defining alerts for a probe in
85the /etc/smokeping/config.d/Probes file, or by individual Targets
86entries.
87
88In our case let's edit the Targets file and turn on alerts for our
89DNS Latency checks.
90   
91        $ sudo vi /etc/smokeping/config.d/Targets
92
93Find (or add if necessary) the following section in the file:
94
95        +DNS
96        probe = DNS
97                ...
98               
99Now let's add an entry for a global DNS server that responds recursively.
100
101        ++GoogleA
102        menu = 8.8.8.8
103        title = DNS Latency for google-public-dns-a.google.com
104        host = google-public-dns-a.google.com
105        alerts = anydelay
106
107Notice the line that says, "alerts=anydelay".
108
109So, in summary - you should have in your Targets file the following section near
110the bottom of the file:
111
112        +DNS
113        probe = DNS
114        menu = DNS Latency
115        title = DNS Latency Probes
116
117        ++GoogleA
118        menu = 8.8.8.8
119        title = DNS Latency for google-public-dns-a.google.com
120        host = google-public-dns-a.google.com
121        alerts = anydelay
122
123(items should be flush left in the file).
124
125Save and exit from the file, then restart smokeping:
126
127    $ sudo service smokeping restart
128
129Now check RT to see if you have received anything from Smokeping. It may take up to 5 minutes
130for a new ticket to appear.
131
132NOTE: - If you have not already configured the DNS Latency checks for Smokeping you may need to
133edit the file /etc/smokeping/config.d/Probes and add in the entry for DNS like this:
134
135        $ sudo vi /etc/smokeping/config.d/Probes
136       
137And, at the bottom of the file add:
138
139        + DNS
140        binary = /usr/bin/dig
141        pings = 5
142        step = 180
143        lookup = www.nsrc.org
144
145Save and exit from the file and restart Smokeping:
146
147        $ sudo service smokeping restart
148
149
1503. Nagios and Request Tracker Ticket Creation
151----------------------------------------------
152
153To configure RT and Nagios so that alerts from Nagios automatically
154create tickets requires a few steps:
155
156* Create a proper contact entry for Nagios in
157  /etc/nagios3/conf.d/contacts_nagios2.cfg
158
159* Create the proper command in Nagios to use the rt-mailgate
160  interface. The command is defined in /etc/nagios3/commands.cfg
161
162These next two items should already be done in RT if you have
163finished the RT exercises.
164
165* Install the rt-mailgate software and configure it properly
166  in your /etc/aliases file for your MTA in use.
167
168* Configure the appropriate queues in RT to receive emails
169  passed to it from Nagios via the rt-mailgate software.
170
171
1725. Configure a Contact in Nagios
173---------------------------------
174
175   - Edit the file /etc/nagios3/conf.d/contacts_nagios2.cfg
176
177   $ sudo bash
178   # vi /etc/nagios3/conf.d/contacts_nagios2.cfg
179
180   - In this file we will first add a new contact name under
181     the default root contact entry. The new contact should
182     look like this:
183
184define contact{
185        contact_name                    net
186        alias                           RT Alert Queue
187        service_notification_period     24x7
188        host_notification_period        24x7
189        service_notification_options    c
190        host_notification_options       d
191        service_notification_commands   notify-service-ticket-by-email
192        host_notification_commands      notify-host-ticket-by-email
193        email                           net@localhost
194        }
195
196   - _DO NOT_ remote the "root" contact_name entry! This entry goes
197     below the "root" contact.
198
199   - the service_notification_option of "c" means only notify once a
200     service is considered "critical" by Nagios (i.e. down). The
201     host_notification_option of "d" means down. By specify only "c"
202     and "d" this means that notifications will not be sent for other
203     states.
204
205   - Note the email address in use "net@localhost" - this is important
206     as this was previously defined for RT.
207
208   - Now we must create a Contact Group that contains this contact.
209     We will call this group "tickets." Do this at the end of the file:
210
211define contactgroup{
212        contactgroup_name       tickets
213        alias                   email to ticket system for RT
214        members                 net,root
215        }
216
217   - You could leave off "root" as a member, but we've left this on to
218     have another user that receives email to help us troubleshoot if
219     there are issues.
220
221   - Now that your contact has been created you need to create the commands
222     that were referenced in the initial contact creation above, these are
223     "notify-service-ticket-by-email" and "notify-host-ticket-by-email"
224
225
2266. Update Nagios Commands
227-------------------------
228
229   - To create the notify-service-ticket-by-email and notify-host-ticket-by-email
230     commands we need to edit the file /etc/nagios3/commands.cfg.
231
232   # vi /etc/nagios3/commands.cfg
233
234  - In this file you already have two command definitions that we are using. These are
235    called notify-host-by-email and notify-service-by-email. We are going to add two
236    new commands.
237
238  - We _strongly_ suggest that you COPY and PASTE the text below. It is almost impossible
239    to type it without errors.
240
241  - Put these two new entries _BELOW_ the current notify-host-by-email and notify-service-by-email
242    command entries. Do not remove the old one.
243
244  - NOTE: The "commands below do not contain breaks. They are a single line. Be aware of this as
245    COPY and PASTE between some editors and environments may insert line breaks.
246
247################################################################
248# Additional commands created for network management workshop #
249################################################################
250
251# 'notifiy-host-ticket-by-email' command definition
252define command{
253        command_name    notify-host-ticket-by-email
254        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
255        }
256
257# 'notify-service-ticket-by-email' command definition
258define command{
259        command_name    notify-service-ticket-by-email
260        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
261        }
262
263
2647. Choose a Service to Monitor with RT Tickets
265----------------------------------------------
266
267
268   - The final step is to tell Nagios that you wish to notify the contact "tickets" for a
269     particular service. If you look in /etc/nagios3/conf.d/generic-service_nagios2.cfg the
270     default contact_groups is "admins". To override this for a service edit the file
271     /etc/nagios3/conf.d/services_nagios2.cfg and a contact_groups entry for one of the
272     service definitions.
273
274    - To send email to generate tickets in RT if HTTP goes down on a box you would edit the
275      HTTP service check so that it looks like this:
276
277# check that web services are running
278define service {
279        hostgroup_name                  http-servers
280        service_description             HTTP
281        check_command                   check_http
282        use                             generic-service
283        notification_interval           0 ; set > 0 if you want to be renotified
284        contact_groups                  tickets
285}
286
287     Note the additional item that we now have, "contact_groups." You can do this for other
288     entries as well if you wish.
289
290   - When you are done, save the file and exit.
291
292   - Now restart Nagios to verify your changes are correct.
293
294   # /etc/init.d/nagios3 stop
295   # /etc/init.d/nagios3 start
296
297
2984.) Generate RT Tickets for Hosts
299---------------------------------
300
301   - To do this you must either specify "contact_groups tickets" for individual host
302     definitions, or you must update the template file for all hosts and change the
303     default contact_groups entry to tickets. This file is generic-host_nagios2.cfg.
304
305   - If you wish to do this go ahead. Tickets will be generated if a host goes down
306     and you have specified the contact_groups for that host as being "tickets"
307
3085. See Nagios Tickets in RT
309---------------------------
310
311To verify your changes have worked we can be sure to monitor for HTTP one of our
312servers that is not running HTTP. Let's pick the second Mac Mini in our class
313or the box known as "s1.ws.nsrc.org" (see the network diagram for details).
314     
315If you do not have an entry for this machine add on to the file where your PCs
316are defined. If this is in a file called pcs.cfg you would do:
317     
318        # vi /etc/nagios3/conf.d/pcs.cfg
319
320In this file add (or verify you have) an entry that looks like this:
321
322define host {
323    use         generic-host
324    host_name   s1
325    alias       s1
326    address     10.10.0.241
327    parents     sw
328}
329
330Save and exit from the file.
331   
332Now edit the file named /etc/nagios3/conf.d/hostgroups_nagios2.cfg and add s2 to the hostgroup
333for HTTP service checks:
334
335        # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg
336       
337Look for the "hostgroup_name http-servers" entry and update it so that it looks like this:
338
339
340# A list of your web servers
341define hostgroup {
342        hostgroup_name  http-servers
343                alias           HTTP servers
344                members         localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,pc11,pc12,
345                                pc13,pc14,pc15,pc16,pc17,pc18,pc19,pc20,pc21,pc22,pc23,pc24,
346                                pc25,pc26,pc28,pc29,pc30,pc31,pc32,pc35,pc37,pc39,s1
347        }
348
349
350_REMEMBER_ that the line with all the "members" must not have any line breaks. Notice that "s1"
351has been entered on the end of the line.
352
353Now save the file and exit and restart Nagios:
354
355        # service nagios3 stop
356        # service nagios3 start
357
358
359   - It will take a while (up to 10 minutes) for Nagios to report that HTTP is
360     "critical", but once that happens a new ticket should appear in your RT instance
361     in the net queue generated by Nagios.
362
363   - Remember to see this go to http://pcX.ws.nsrc.org/rt/ and log in as Username "sysadmin"
364     with the password you chose when you created the RT sysadmin account. The new
365     ticket should appear in the "10 newest unowned tickets" box in the main log in
366     page in RT.
367
3686. Configure Cacti to send emails to net@localhost to generate tickets in RT
369----------------------------------------------------------------------------
370
371If you have not installed the Plugin Architecture for Cacti, then please be sure to
372attempt this exercise last.
373
374You can view how this work by logging in on the Cacti instance running on the noc
375box as this has the Cacti Plugin Architecture installed and the two plugins called,
376"Settings" and "Threshold".
377
378To see how Cacti can generate a ticket first go to:
379
380        http://noc.ws.nsrc.org/cacti/
381       
382Log in as "admin" (system password). The do:
383
384        * Click on the Console tab (upper-left)
385        * Click on "Settings" (lower-left)
386        * Click on the "Mail / DNS" tab (upper-right)
387        * Verify that the fields for email are properly filled in:
388                - Test Email                    (sysadm or net @ localhost)
389                - Mail Services                 (PHP Mail() Function)
390                - From Email Address            (cacti@localhost)
391                - From Name                     (Cacti System Monitor)
392                - SMTP Hostname                 (localhost)
393                - SMTP Port                     (25)
394               
395Now we need to create a threshold that we'll use to trigger an email that, in turn, will
396create a ticket in RT:
397
398        * Click on "Thresholds" (middle-left)
399        * Click on the "Add" option (upper-right)
400        * Select a Host (localhost, for example)
401        * Select a Graph (Processes)
402        * Select the Data Source (proc)
403        * Click on the "create" button
404
405Now you will be presented with a detailed screen where you can specify what should
406happen if the threshhold is reached. Verify or do the following:
407
408        * Threshold Name:               Something Descriptive
409        * Very that "Threshold Enabled" is checked
410        * Threshold Type:               High / Low Values (for Processes)
411        * High Threshold:               50 (this will cause the threshold to trip)
412        * Breach Duration:              5 minutes (this will give us ticket in 5 to 10 minutes)
413        * Data Type:                    Exact Value
414        * Re-Alert Cycle:               Never
415        * Extra Alert Emails:           net@localhost,sysadm@localhost
416       
417This will send an email to net@localhost within 5 or 10 minutes. This will create a
418new ticket in RT. In addition an email will go to sysadm@localhost. You can view the
419email as the sysadm user by doing:
420
421        $ mutt -f /var/mail/sysadm
422       
423You can create all types of threshold states that can be tripped, which will result in
424ticket creation. Feel free to play around with the cacti instance on the Noc to create
425new thresholds. You can see if they are working by logging in on the Noc instance of
426Request Tracker (RT) at:
427
428        http://noc.ws.nsrc.org/rt/
429       
430Username "sysadm" and password is the class password.
431
432
433+-----+
434Last update 2jun2011
435Hervey Allen