Network Monitoring and Management

Smokeping
---------

Notes:
------
* Commands preceded with "$" imply that you should execute the command as
  a general user - not as root.
* Commands preceded with "#" imply that you should be working as root.
* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") 
  imply that you are executing commands on remote equipment, or within 
  another program.

Exercises
----------

0. Log in to your PC or open a terminal window as the sysadmn user.

Once you are logged in you can continue with these exercises.

1. Install Smokeping

    $ sudo apt-get install smokeping

2. Initial Configuration

    $ cd /etc/smokeping/config.d
    $ ls -l

    -rwxr-xr-x 1 root root  578 2010-02-26 01:55 Alerts
    -rwxr-xr-x 1 root root  237 2010-02-26 01:55 Database
    -rwxr-xr-x 1 root root  413 2010-02-26 05:40 General
    -rwxr-xr-x 1 root root  271 2010-02-26 01:55 pathnames
    -rwxr-xr-x 1 root root  859 2010-02-26 01:55 Presentation
    -rwxr-xr-x 1 root root  116 2010-02-26 01:55 Probes
    -rwxr-xr-x 1 root root  155 2010-02-26 01:55 Slaves
    -rwxr-xr-x 1 root root 8990 2010-02-26 06:30 Targets
 
    $ sudo vi General

        Change the following lines:

        owner    = NOC
        contact  = sysadm@localhost
        cgiurl   = http://localhost/cgi-bin/smokeping.cgi
        mailhost = localhost

        Save the file and exit. Now let's restart the
        Smokeping service to verify that no mistakes have been made
        before going any further:

        $ sudo service smokeping restart

2. Configure monitoring of devices

        The majority of your time and work configuring Smokeping
        will be done in the file /etc/smokeping/config.d/Targets.
        
        For this class please do the following:

        Use the default FPing probe to check:

      - all the student PCs
      - classroom NOC
      - switches
      - routers
      
    You can use the classroom Network Diagram on the classroom wiki to 
    figure out addresses for each item, etc.

    ###############################################################################
    Please note - a complete set of configuration files is available on the class
    web server at http://noc.ws.nsrc.org/configs 
    
    We strongly suggest you review these files to see what a complete set of 
    Smokeping configuration files looks like. To review the Targets file the direct
    link is:
    
	http://noc/configs/etc/smokeping/config.d/Targets  
    
    ###############################################################################
    
    Create some hierarchy to the Smokeping menu for your
    checks. Such as:

	+Local

	menu = Network Monitoring and Management 
	title = NOC Server for Network Monitoring Class

	++LocalMachine

	menu = The NOC@NetManage
	title = The NOC@NetManage
	host = localhost

    #
    # Classroom PCs
    #

	++PCs

    +++pc1
    menu = pc1
    title = pc1
    host = pc1

    +++pc2
    menu = pc2
    title = pc2
    host = pc2
        
    Save the file and restart Smokeping:

        $ sudo service smokeping restart

    Go to your browser and check the Smokeping page:

	http://pcN.ws.nsrc.org/cgi-bin/smokeping.cgi

    If everything is looking OK, continue adding:

	++Routers

	+++rtr
	menu = rtr
	title = Gateway Router
	host = rtr

	+++rtr1
	menu = rtr1
	title = Router 1, Group 1
	host = rtr1

    ++Switch

    ++sw
    menu = sw
    title = Backbone Switch
    host = sw
    
        ...

    Save the file, restart smokeping, and check
    your browser again.

3. Add new probes

    The current entry in Probes is fine, but if you wish to
    use additional Smokeping checks you can add them in here
    and you can specify their default behavior. You can do
    this, as well, in the Targets file if you wish.

    Here is an example of a Probes file that would specify
    what to use to check for HTTP and DNS latency as well as
    the FPing probe that is used for ping latency:

       $ sudo vi Probes

        *** Probes ***

        + FPing

        binary = /usr/bin/fping

        + EchoPingHttp

        + DNS
        binary = /usr/bin/dig
        pings = 5
        step = 180
        lookup = www.nsrc.org

	Save the file.

4. Add HTTP latency checks

    Now edit your Targets again:

	$ sudo vi Targets
	
    Add a check for HTTP latency for all the classroom PCs. 
    This will mean adding another category, such as:

    ++HTTP Servers
	probe = EchoPingHttp
	menu = HTTP Response
    title = HTTP Response Student PCs

	+++pc1
	menu = pc1
	title = PC1 HTTP Response Time
	host = pc1

	+++pc1
	host = pc2
	title = PC2 HTTP Response Time
	host = pc2
	
	...

    If you have time, consider checking some machines that are
    external to our classroom and the conference (your organization's
    website, a popular web page, etc...)

5. Add DNS Latency Checks

    You can check either or both internal or external names using
    the DNS latency probe.

    Add a menu hierarchy for DNS Latency. Check an external address
    (nsrc.org) and an internal address (noc). This will look something
    like this (in Targets):

        +DNS 
        probe = DNS
        menu = External DNS Check
        title = DNS Latency

        ++nsrc
        host = nsrc.org

        ++noc
        host = noc.mgmt

    Exit and save your changes to the file Targets.

    Restart Smokeping to see the changes:

    $ sudo service smokeping restart

    Look at additional Smokeping probes and consider implementing
    some of them:

        http://oss.oetiker.ch/smokeping/probe/index.en.html

    As trying to explain all syntactical details of how the file
    /etc/smokeping/config.d/Targets is used would require several
    pages we will go through some examples in class, and you can
    refer to the Smokeping configuration files that are in use on 
    the classroom NOC box by going to:

        http://noc/configs/etc/smokeping
        http://noc/configs/etc/smokeping/config.d


6. Send Smokeping alerts

    $ sudo vi Alerts

    Update the top of the file where it says:

        *** Alerts ***
        to = alertee@address.somewhere
        from = smokealert@company.xy

    to include a proper "to" and "from" field for your server.
    Something like:

        *** Alerts ***
        to = sysadm@localhost
        from = smokeping-alert@localhost

    If you have installed RT, you can instead send your alerts
    to an existing RT queue:

        *** Alerts ***
        to = net@localhost

    At the end of the file, add another alert like this:

    +anydelay
    type = rtt
    # in milliseconds
    pattern = >1
    comment = Just for testing

    Notice the pattern in this alert. It means that an alert will be triggered
    as soon as a sample measurement has "ANY" delay, that is, more than one
    millisecond. This is just for testing. In reality, you will want to create
    an alert based on your observed baseline. For example, if your DNS servers'
    delay suddendly goes from under 10 ms to over 100ms.

    Next, be sure you have this test alert defined for some of your Targets.
    You can either turn on alerts by defining alerts for a probe in
    the /etc/smokeping/config.d/Probes file, or by individual Targets
    entries. 

    In our case let's edit the Targets file and turn on alerts for our
    DNS Latency checks. 

    $ sudo vi /etc/smokeping/config.d/Targets

    Find the following section in the file:

        +DNS
        probe = DNS
        menu = External DNS Check
        title = DNS Latency

        ++nsrc
        host = nsrc.org

    And add the following alerts line after "+++ nsrc"

        +++nsrc
        host = nsrc
        alerts = anydelay

    Save and exit from the file, then restart smokeping:

    $ sudo service smokeping restart

    Check your e-mail with mutt

    $ mutt

    (or check your RT queues)

    And see if you have received alerts after 5 minutes.

6. MultiHost Graphs

    Once you have defined a group of hosts under a single probe type in your
    /etc/smokeping/config.d/Targets file, then you can create a single graph
    that will show you the results of all smokeping tests for all hosts that
    you define. This has the advantage of letting you quickly compare, for
    example, a group of hosts that you are monitoring with the FPing probe.

    The MultiHost graph function in Smokeping is extremely picky - pay close
        attention.

    To create a MultiHost graph first edit the file Targets:

    $ sudo vi Targets

    If you had a section for the FPing probe defined that looked like this
    (this is an example only - your Targets file may look different):

        +Local
        menu = Local
        title = Local Network

        ++LocalMachine
        menu = Local Machine
        title = This host
        host = localhost

        ++pc1
        menu = pc1
        title = pc1
        host = pc1

        ++pc2
        menu = pc2
        title = pc2
        host = pc2

        ++pc3
        menu = pc3
        title = pc3
        host = pc3

    Right now smokeping displays the results of the FPing probe for each
    host defined in separate graphs. If you wish to see the results in a 
    single graph with multiple lines, then you would do this after the last
    FPing probe host definition:

        +MultiHostPCs
        menu = MultiHost Ping
        title = Consolidated Ping Response Time
        host = /Local/LocalMachine /Local/pc1 /Local/pc2 /Local/pc3

    (Note: if the lines get too long, you can have multiple lines for the
    "host" entry by using the "\" character to indicate another line - ask about
    this if you are unsure!)

    Now save and exit the file Targets and restart smokeping:

    $ sudo service smokeping restart

    You should see a new graph under the "MultiHost Ping" menu in your
    smokeping web interface. This graph will have different color lines
    for each host you have defined.


7. Slave instances - only done if we have the time.

    This is a description only for informational purposes in case you wish
    to attempt this type of configuration once the workshop is over.

    The idea behind this is that you can run multiple smokeping instances 
    at multiple locations that are monitoring the same hosts and/or services
    as your master instance. The slaves will send their results to the 
    master server and you will see these results side-by-side with your
    local results. This allows you to view how users outside your network
    see your services and hosts.

    This can be a powerful tool for resolving service and host issues that
    may be difficult to troubleshoot if you only have local data.

    Graphically this looks this:

          [slave 1]     [slave 2]      [slave 3]
                |             |              |
                +-------+     |     +--------+
                        |     |     |
                        v     v     v
                        +---------------+
                        |    master     |
                        +---------------+

    You can see example of this data here:

    http://oss.oetiker.ch/smokeping-demo/

    Look at the various graph groups and notice that many of the graphs
    have multiple lines with the color code chart listing items such as
    "median RTT from mipsrv01" - These are not MultiHost graphs, but rather
    graphs with data from external smokeping servers.

    To configure a smokeping master/slave server you can see the documentation
    here:

    http://oss.oetiker.ch/smokeping/doc/smokeping_master_slave.en.html

    In addition, a sample set of steps for configuring this is available in
    the file sample-smokeping-master-slave.txt which is available under the 
    attachments listing on the agenda page of the class wiki.