nagios

You are currently browsing articles tagged nagios.

I use revision control systems for almost all of my software development, deployment, and server configuration. When using Subversion there is, technically, no difference between a working copy on my personal or development machines vs. a working copy on a production server. (Yes, you need a working copy on the production server, because exports cannot be updated that easily.) However, modifications of the checked out code or config files on a production server can cause problems with the next round of updates. Sometimes you just don’t notice that there are conflicts that need to be resolved.

To make sure I notice when a colleague or I have fallen back into the bad habit of changing things on a production server directly, rather than checking in changes to the repository, I have created a Nagios Plugin: it’s called “check_svnstatus”

Click here to download the plugin and to see an example configuration

Since the upgrade to version 3.2.2 Nagios does not update the host alias macro when the configuration is reloaded. Macros are the variables used in commands, such as notification commands, i.e. the information you receive via E-mail when there is a problem with your hosts or services.

As long as you don’t change your host alias information you will never notice. I did change it and I found myself scratching my head for some time, especially since the same information does get updated in the web interface.

I found a greater number of forum discussions relating to this, but none seemed to offer a practical solution for larger scale environments.

After some pondering I came up with these lines:

service nagios stop
cp /usr/local/nagios/var/retention.dat /usr/local/nagios/var/retention.bak
grep -v ^alias /usr/local/nagios/var/retention.bak > /usr/local/nagios/var/retention.dat
service nagios start

They do the trick in my installations.

Update: Here’s a brief summary of the error and what the script does to work around it.

When Nagios parses the configuration and finds a new host it loads the alias field into memory. When Nagios reloads the configuration, the alias information that is already in memory does not get updated. When Nagios stops (or before reloading the configuration) the alias information in memory is dumped to the retention.dat file. When Nagios isn’t running and you start the service, it doesn’t load the alias information from the configuration files but from retention.dat, unless that information is not present in the file. So my solution does the following:

  1. Stop Nagios
  2. Make a backup copy of the retention.dat file (Note: I’m copying the file instead of renaming it to make sure that it will still have the same owner/group and permissions when I write to the original file in the next step)
  3. Strip all lines starting with “alias” from the backup file and overwrite the original retention.dat file with this data
  4. Start Nagios

You may have to tweak the file location for the retention.dat and backup files. You may also need to change the commands that stop and start the Nagios deamon.

The Nagios Statusmap is one of those features that gets a lot of attention when you first set up your monitoring server, but when looking back after a while most people notice that they don’t really use it at all.

When it comes to daily monitoring I never found it very useful, either, but it has always served one important purpose for me: when adding new hosts or networks the Statusmap reveals whether I got all my parent/child relationships right. And since I work in a dynamic and growing environment I add a lot of hosts on a regular basis.

There is one thing that always annoyed me when looking at the Statusmap: when you exclude certain host groups from the map, only the host icons for those hosts get removed, but the map still shows their status in green or red, and with over a hundred hosts it’s still very hard to identify individual hosts.

Today I stumbled across a patch for the Statusmap on the Nagios Exchange that addresses this very issue. It really excludes the hosts from the map, i.e. it is re-drawn as if the excluded hosts just didn’t exist.

Here’s an example:

original Statusmap (before the update)

improved Statusmap (after the update)

If you would like to update your Nagios install, proceed as follows. I’m assuming that you have built Nagios yourself. I have tested this with the most recent version of Nagios 3.2.3

Before you begin, cd to the cgi folder inside your Nagios source download folder, e.g. ~/downloads/nagios-3.2.3/cgi

curl "http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=1807&cf_id=24" > statusmap.diff
patch statusmap.c statusmap.diff
make statusmap.cgi
cp statusmap.cgi /usr/local/nagios/sbin/
cd /usr/local/nagios/sbin/
chmod g+w statusmap.cgi
chown nagios:nagios statusmap.cgi

I use Nagios to monitor my own servers (and clients’ servers and my employers’ servers, too) and I am giving the new Nagios V-Shell a test-run.

Why? I don’t actually care much about the fact that it’s written in PHP and generates valid XHTML. I am rather interested in the fact that it doesn’t use frames. My hopes are that the interface works nicely on BlackBerries and other smartphones. – Yes, there are frontends which were designed with small screens in mind, but they are either too focused on iPhones or Android with heavy use of JavaScript, or lack some important features such as the ability to acknowledge host or service states etc.

I have yet to test the V-Shell with multiple users, or other users than “nagiosadmin”… but this is the bottom line for today:

  • Installation is very easy and straight forward
  • V-Shell works fairly well on the BlackBerry – slow, but easy to use and intuitive; the views and controls are very close to the original Nagios Core web interface
  • Two things weren’t working well in the beginning, but I managed to fix them…

Host icons

I use host icons. They’re mainly just pleasing to the eye in the HTML parts of the web interface, but very useful when you look at the status map. I maintain my own set of icon files in the Nagios Core web interface, but unfortunately the V-Shell doesn’t automatically link the host icons to that location (whereas it relies on the Core interface for service and host commands). My solution is:

  1. Remove the logos that shipped with V-Shell: rm -rf /usr/local/vshell/views/images/logos
  2. Create a symbolic link to the Core logos folder: cd /usr/local/vshell/views/images ; ln -s /usr/local/nagios/share/images/logos
  3. Append the following lines in the vshell_apache.conf files right before the line that says </Directory>
### for the logos
Options +FollowSymLinks

NagiosGrapher

I still use the old NagiosGrapher. I know I should move to something newer, but I love my historical data and there are just so many other things on my todo-lists at the moment. The links to the graphs are not working from V-Shell, but this time the blame is not on V-Shell but on the NagiosGrapher. Anyway, I wanted a solution and here it is:

  1. Copy the NagiosGrapher CGIs (graphs.cgi, rrd2-graph.cgi, and rrd2-system.cgi) from your Nagios Core CGI folder to the V-Shell folder
  2. Append the following lines in the vshell_apache.conf files right before the line that says </Directory>
### for the nagios_grapher
Options +ExecCGI
AddHandler cgi-script cgi