Bash Scripting and Nagios

APRICOT 2008: Network Management Workshop

Part 1: Create a a Nagios Plug-in
Part 2: Set Up Nagios Notification for our New Plug-in
Part 3: Update Our Plug-in for any Number of Hosts

Part 1

Create a Nagios Plug-in

We want to create a plug-in for Nagios using a shell script. The shell script is going to do the following:

  • Ping some group of servers. You will define this group.
  • If one does not respond, this gives us a warning.
  • If 2, or more, are down this generates an alert.
We will want to only run this service on the localhost. It does not make sense to define this as running on another host.

In order to create a plug-in for Nagios there are several steps you must take. These are (not necessarily in any order):

  • Create your script (bash, perl, etc.)
  • Create the service definition.
  • Create a configuration file for the new service definition (i.e., "plug-in").
  • Reload Nagios to see the changes
Your shell script will exit with three possible values - these are:
  • 0: Everything is fine
  • 1: Warning
  • 2: Critical
You can define an exit value of '3' for your script, which Nagios will see as "indeterminate". This usually implies the plug-in was not able to run as expected. What would be an example of this?

To start here is the service definition for our plug-in. We will call our plug-in "check_internet":

Place this in the file /etc/nagios2/conf.d/localhost_nagios2.cfg:

# Define a new service...

define service {
        use                             generic-service
        host_name                       localhost
        service_description             External Host Group Check
        check_command                   check_internet!!!!!

Notice the machines we have defined. You can define any other set of machines you want. The way our plug-in will be defined you will be able to check up to 5 hosts. This is completely arbitrary to make our exercise easier to do. If you wanted you could define the configuration for your new plug-in so that a single argument is passed, and then you can use the "for i" clause to parse each item on the command line for the script.

If you are interested here is an example from the Nagios documenation:

To start let's describe a script in more detail. We want you to do the following:

  • Use ping to ping each machine only once. Be sure to set the timeout on the ping to be long enough in case it takes a while for a machine to respond.
  • Ping each machine listed in your service definition above, up to 4 machines (remember, this is arbitrary).
  • If all machines respond, exit your script with a value of zero (0).
  • If all machines except one respond exit your script with a value of one (1)
  • If 2, or more, machines do not respond, then exit your script with a value of two (2).
  • You can choose to echo a message that will appear in your Nagios interface.
To do this you will need to understand some basics of ping and shell scripting. These include: The files that you need to create or edit for all this to work are:
  • Service Definition for the Localhost
  • Plug-in Parameter Definition file for new Service Defintion
  • The script file or your new Nagios plug-in
If you are paying attention the files are here. :-)

Part 2

Set Up Nagios Notification for our New Plug-in

Now that we have a new plug-in let's set up Nagios to send an email to an account of your specification if the new plug-in generates an alert. That is, if 2, or more, machines are down, then someone needs to be notified.

To do this you need to do the following in Nagios:

  • Create a contact
  • Place the contact in a contact group
  • Place the contact group in your device configuration file
  • Define how many times and/or how often you want the members of the contact group in the device configuration file to receive the alert.
To start look at the file /etc/nagios2/conf.d/contacts_nagios2.cfg and see how it is formatted. You will want to create a new contact and then a new contact group. If you need an example there is one here.

Once the contactgroup has been created, then you need to define that this contactgroup will receive notifications for processes being monitored on or by localhost. You do this by editing the file /etc/nagios2/conf.d/localhost_nagios2.cfg and updating the "host" section of the file at the top.

Once you do this be sure to reload the Nagios process by doing:
# /etc/init.d/nagios2 stop
# /etc/init.d/nagios2 start

Part 3

Update Our Plug-in for any Number of Hosts

Update your script and the check_internet plug-in configuration to allow for any number of hosts to be specified for checking. This means that you need to change the following two files:

  • Plug-in Parameter Definition file for new Service Defintion
  • The script file or your new Nagios plug-in
The plug-in parameter defintion file should only list a single command-line parameter as being passed.

The script needs to use the "for i in", or just "for i" construct to parse the values on the command line and then act on each. You can find an example of this here:


Last modified: Sun Feb 24 14:10:14 CST 2008