Agenda: exercises-tickets-cacti-nagios-smokeping.txt

File exercises-tickets-cacti-nagios-smokeping.txt, 15.1 KB (added by hervey, 8 years ago)

Line
1	Network Monitoring and Management
2
3	Cacti, Nagios and Smokeping Ticket Creation with Request Tracker
4	----------------------------------------------------------------
5
6	Notes:
7	------
8	* Commands preceded with "$" imply that you should execute the command as
9	a general user - not as root.
10	* Commands preceded with "#" imply that you should be working as root.
11	* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
12	imply that you are executing commands on remote equipment, or within
13	another program.
14
15	Exercises
16	---------
17
18	At this point in the week you should have Cacti, Nagios and Smokeping
19	installed on your PCs. These exercises show you how to set up each
20	of these programs to send alerts to the RT (Request Tracker) ticketing
21	system to generate tickets.
22
23
24	Exercises Part I
25	----------------
26
27	0. Log in to your PC or open a terminal window as the sysadm user.
28
29	1. Verify that you have configured rt-mailgate to work with your MTA
30	---------------------------------------------------------------------
31
32	Open the file /etc/aliases:
33
34	$ sudo editor /etc/aliases
35
36	In the file /etc/aliases you should have the following two lines:
37
38	net-comment: "\|/usr/bin/rt-mailgate --queue net --action comment --url http://localhost/rt/"
39	net: "\|/usr/bin/rt-mailgate --queue net --action correspond --url http://localhost/rt/"
40
41	If these lines are not in /etc/aliases, then be sure to add them. When you are done save
42	the file and exit. Then you need to tell the MTA (Mail Transfer Agent) that there are some
43	new aliases to be used:
44
45	$ sudo newaliases
46
47
48	2. Configure Smokeping
49	----------------------
50
51	In the file:
52
53	/etc/smokeping/config.d/Alerts
54
55	You can tell Smokeping where alert outputs should go. Edit the file:
56
57	$ sudo vi /etc/smokeping/config.d/Alerts
58
59	And Update the top of the file to be:
60
61	* Alerts *
62	to = net@localhost
63	from = smokealert@localhost
64
65	At the end of the file, add another alert like this:
66
67	+anydelay
68	type = rtt
69	# in milliseconds
70	pattern = >1
71	comment = Just for testing
72
73	Be sure that all text is flush left in the file.
74
75	Now exit and save the file.
76
77	Notice the pattern in this alert. It means that an alert will be triggered
78	as soon as a sample measurement has "ANY" delay, that is, more than one
79	millisecond. This is just for testing. In reality, you will want to create
80	an alert based on your observed baseline. For example, if your DNS servers'
81	delay suddendly goes from under 10 ms to over 100ms.
82
83	Next, be sure you have this test alert defined for some of your Targets.
84	You can either turn on alerts by defining alerts for a probe in
85	the /etc/smokeping/config.d/Probes file, or by individual Targets
86	entries.
87
88	In our case let's edit the Targets file and turn on alerts for our
89	DNS Latency checks.
90
91	$ sudo vi /etc/smokeping/config.d/Targets
92
93	Find (or add if necessary) the following section in the file:
94
95	+DNS
96	probe = DNS
97	...
98
99	Now let's add an entry for a global DNS server that responds recursively.
100
101	++GoogleA
102	menu = 8.8.8.8
103	title = DNS Latency for google-public-dns-a.google.com
104	host = google-public-dns-a.google.com
105	alerts = anydelay
106
107	Notice the line that says, "alerts=anydelay".
108
109	So, in summary - you should have in your Targets file the following section near
110	the bottom of the file:
111
112	+DNS
113	probe = DNS
114	menu = DNS Latency
115	title = DNS Latency Probes
116
117	++GoogleA
118	menu = 8.8.8.8
119	title = DNS Latency for google-public-dns-a.google.com
120	host = google-public-dns-a.google.com
121	alerts = anydelay
122
123	(items should be flush left in the file).
124
125	Save and exit from the file, then restart smokeping:
126
127	$ sudo service smokeping restart
128
129	Now check RT to see if you have received anything from Smokeping. It may take up to 5 minutes
130	for a new ticket to appear.
131
132	NOTE: - If you have not already configured the DNS Latency checks for Smokeping you may need to
133	edit the file /etc/smokeping/config.d/Probes and add in the entry for DNS like this:
134
135	$ sudo vi /etc/smokeping/config.d/Probes
136
137	And, at the bottom of the file add:
138
139	+ DNS
140	binary = /usr/bin/dig
141	pings = 5
142	step = 180
143	lookup = www.nsrc.org
144
145	Save and exit from the file and restart Smokeping:
146
147	$ sudo service smokeping restart
148
149
150	3. Nagios and Request Tracker Ticket Creation
151	----------------------------------------------
152
153	To configure RT and Nagios so that alerts from Nagios automatically
154	create tickets requires a few steps:
155
156	* Create a proper contact entry for Nagios in
157	/etc/nagios3/conf.d/contacts_nagios2.cfg
158
159	* Create the proper command in Nagios to use the rt-mailgate
160	interface. The command is defined in /etc/nagios3/commands.cfg
161
162	These next two items should already be done in RT if you have
163	finished the RT exercises.
164
165	* Install the rt-mailgate software and configure it properly
166	in your /etc/aliases file for your MTA in use.
167
168	* Configure the appropriate queues in RT to receive emails
169	passed to it from Nagios via the rt-mailgate software.
170
171
172	5. Configure a Contact in Nagios
173	---------------------------------
174
175	- Edit the file /etc/nagios3/conf.d/contacts_nagios2.cfg
176
177	$ sudo bash
178	# vi /etc/nagios3/conf.d/contacts_nagios2.cfg
179
180	- In this file we will first add a new contact name under
181	the default root contact entry. The new contact should
182	look like this:
183
184	define contact{
185	contact_name net
186	alias RT Alert Queue
187	service_notification_period 24x7
188	host_notification_period 24x7
189	service_notification_options c
190	host_notification_options d
191	service_notification_commands notify-service-ticket-by-email
192	host_notification_commands notify-host-ticket-by-email
193	email net@localhost
194	}
195
196	- _DO NOT_ remote the "root" contact_name entry! This entry goes
197	below the "root" contact.
198
199	- the service_notification_option of "c" means only notify once a
200	service is considered "critical" by Nagios (i.e. down). The
201	host_notification_option of "d" means down. By specify only "c"
202	and "d" this means that notifications will not be sent for other
203	states.
204
205	- Note the email address in use "net@localhost" - this is important
206	as this was previously defined for RT.
207
208	- Now we must create a Contact Group that contains this contact.
209	We will call this group "tickets." Do this at the end of the file:
210
211	define contactgroup{
212	contactgroup_name tickets
213	alias email to ticket system for RT
214	members net,root
215	}
216
217	- You could leave off "root" as a member, but we've left this on to
218	have another user that receives email to help us troubleshoot if
219	there are issues.
220
221	- Now that your contact has been created you need to create the commands
222	that were referenced in the initial contact creation above, these are
223	"notify-service-ticket-by-email" and "notify-host-ticket-by-email"
224
225
226	6. Update Nagios Commands
227	-------------------------
228
229	- To create the notify-service-ticket-by-email and notify-host-ticket-by-email
230	commands we need to edit the file /etc/nagios3/commands.cfg.
231
232	# vi /etc/nagios3/commands.cfg
233
234	- In this file you already have two command definitions that we are using. These are
235	called notify-host-by-email and notify-service-by-email. We are going to add two
236	new commands.
237
238	- We _strongly_ suggest that you COPY and PASTE the text below. It is almost impossible
239	to type it without errors.
240
241	- Put these two new entries _BELOW_ the current notify-host-by-email and notify-service-by-email
242	command entries. Do not remove the old one.
243
244	- NOTE: The "commands below do not contain breaks. They are a single line. Be aware of this as
245	COPY and PASTE between some editors and environments may insert line breaks.
246
247	################################################################
248	# Additional commands created for network management workshop #
249	################################################################
250
251	# 'notifiy-host-ticket-by-email' command definition
252	define command{
253	command_name notify-host-ticket-by-email
254	command_line /usr/bin/printf "%b" "*** Nagios *\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" \| /usr/bin/mail -s " $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
255	}
256
257	# 'notify-service-ticket-by-email' command definition
258	define command{
259	command_name notify-service-ticket-by-email
260	command_line /usr/bin/printf "%b" "*** Nagios *\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" \| /usr/bin/mail -s " $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
261	}
262
263
264	7. Choose a Service to Monitor with RT Tickets
265	----------------------------------------------
266
267
268	- The final step is to tell Nagios that you wish to notify the contact "tickets" for a
269	particular service. If you look in /etc/nagios3/conf.d/generic-service_nagios2.cfg the
270	default contact_groups is "admins". To override this for a service edit the file
271	/etc/nagios3/conf.d/services_nagios2.cfg and a contact_groups entry for one of the
272	service definitions.
273
274	- To send email to generate tickets in RT if HTTP goes down on a box you would edit the
275	HTTP service check so that it looks like this:
276
277	# check that web services are running
278	define service {
279	hostgroup_name http-servers
280	service_description HTTP
281	check_command check_http
282	use generic-service
283	notification_interval 0 ; set > 0 if you want to be renotified
284	contact_groups tickets
285	}
286
287	Note the additional item that we now have, "contact_groups." You can do this for other
288	entries as well if you wish.
289
290	- When you are done, save the file and exit.
291
292	- Now restart Nagios to verify your changes are correct.
293
294	# /etc/init.d/nagios3 stop
295	# /etc/init.d/nagios3 start
296
297
298	4.) Generate RT Tickets for Hosts
299	---------------------------------
300
301	- To do this you must either specify "contact_groups tickets" for individual host
302	definitions, or you must update the template file for all hosts and change the
303	default contact_groups entry to tickets. This file is generic-host_nagios2.cfg.
304
305	- If you wish to do this go ahead. Tickets will be generated if a host goes down
306	and you have specified the contact_groups for that host as being "tickets"
307
308	5. See Nagios Tickets in RT
309	---------------------------
310
311	To verify your changes have worked we can be sure to monitor for HTTP one of our
312	servers that is not running HTTP. Let's pick the second Mac Mini in our class
313	or the box known as "s1.ws.nsrc.org" (see the network diagram for details).
314
315	If you do not have an entry for this machine add on to the file where your PCs
316	are defined. If this is in a file called pcs.cfg you would do:
317
318	# vi /etc/nagios3/conf.d/pcs.cfg
319
320	In this file add (or verify you have) an entry that looks like this:
321
322	define host {
323	use generic-host
324	host_name s1
325	alias s1
326	address 10.10.0.241
327	parents sw
328	}
329
330	Save and exit from the file.
331
332	Now edit the file named /etc/nagios3/conf.d/hostgroups_nagios2.cfg and add s2 to the hostgroup
333	for HTTP service checks:
334
335	# vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg
336
337	Look for the "hostgroup_name http-servers" entry and update it so that it looks like this:
338
339
340	# A list of your web servers
341	define hostgroup {
342	hostgroup_name http-servers
343	alias HTTP servers
344	members localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,pc11,pc12,
345	pc13,pc14,pc15,pc16,pc17,pc18,pc19,pc20,pc21,pc22,pc23,pc24,
346	pc25,pc26,pc28,pc29,pc30,pc31,pc32,pc35,pc37,pc39,s1
347	}
348
349
350	_REMEMBER_ that the line with all the "members" must not have any line breaks. Notice that "s1"
351	has been entered on the end of the line.
352
353	Now save the file and exit and restart Nagios:
354
355	# service nagios3 stop
356	# service nagios3 start
357
358
359	- It will take a while (up to 10 minutes) for Nagios to report that HTTP is
360	"critical", but once that happens a new ticket should appear in your RT instance
361	in the net queue generated by Nagios.
362
363	- Remember to see this go to http://pcX.ws.nsrc.org/rt/ and log in as Username "sysadmin"
364	with the password you chose when you created the RT sysadmin account. The new
365	ticket should appear in the "10 newest unowned tickets" box in the main log in
366	page in RT.
367
368	6. Configure Cacti to send emails to net@localhost to generate tickets in RT
369	----------------------------------------------------------------------------
370
371	If you have not installed the Plugin Architecture for Cacti, then please be sure to
372	attempt this exercise last.
373
374	You can view how this work by logging in on the Cacti instance running on the noc
375	box as this has the Cacti Plugin Architecture installed and the two plugins called,
376	"Settings" and "Threshold".
377
378	To see how Cacti can generate a ticket first go to:
379
380	http://noc.ws.nsrc.org/cacti/
381
382	Log in as "admin" (system password). The do:
383
384	* Click on the Console tab (upper-left)
385	* Click on "Settings" (lower-left)
386	* Click on the "Mail / DNS" tab (upper-right)
387	* Verify that the fields for email are properly filled in:
388	- Test Email (sysadm or net @ localhost)
389	- Mail Services (PHP Mail() Function)
390	- From Email Address (cacti@localhost)
391	- From Name (Cacti System Monitor)
392	- SMTP Hostname (localhost)
393	- SMTP Port (25)
394
395	Now we need to create a threshold that we'll use to trigger an email that, in turn, will
396	create a ticket in RT:
397
398	* Click on "Thresholds" (middle-left)
399	* Click on the "Add" option (upper-right)
400	* Select a Host (localhost, for example)
401	* Select a Graph (Processes)
402	* Select the Data Source (proc)
403	* Click on the "create" button
404
405	Now you will be presented with a detailed screen where you can specify what should
406	happen if the threshhold is reached. Verify or do the following:
407
408	* Threshold Name: Something Descriptive
409	* Very that "Threshold Enabled" is checked
410	* Threshold Type: High / Low Values (for Processes)
411	* High Threshold: 50 (this will cause the threshold to trip)
412	* Breach Duration: 5 minutes (this will give us ticket in 5 to 10 minutes)
413	* Data Type: Exact Value
414	* Re-Alert Cycle: Never
415	* Extra Alert Emails: net@localhost,sysadm@localhost
416
417	This will send an email to net@localhost within 5 or 10 minutes. This will create a
418	new ticket in RT. In addition an email will go to sysadm@localhost. You can view the
419	email as the sysadm user by doing:
420
421	$ mutt -f /var/mail/sysadm
422
423	You can create all types of threshold states that can be tripped, which will result in
424	ticket creation. Feel free to play around with the cacti instance on the Noc to create
425	new thresholds. You can see if they are working by logging in on the Noc instance of
426	Request Tracker (RT) at:
427
428	http://noc.ws.nsrc.org/rt/
429
430	Username "sysadm" and password is the class password.
431
432
433	+-----+
434	Last update 2jun2011
435	Hervey Allen