Hi All,
Firstly – greatest apologies for long time away. I have been completing my thesis into virtualization. In my role, i’ve spent the last 3 weeks deploying a Nagios solution to multiple customer locations to monitor Windows servers. Here, i will post how i did it, the issues that arose, how i dealt with them etc so you wont have to.

First of all, you will need to forward ports to not only the remote server, but the local server.
Your local Nagios box needs to have ports 5666-5667, 1248-1249 and 12489 forwarded to IP (TCP ports).
The remote server will need to have the above ports forwarded to it also. On Netgears you need to create the service and forward it through firewall, in Drayteks it is hidden under “NAT -> Port Redirection”, etc.

Once your ports are forwarded correctly, we need to look at how the Nagios solution works. Basically, you need to install Linux (your choice) onto a system that will act as your monitoring box. It does not have to be powerful, but again running it on a Atom processor is not advisable!

I have my Nagios solution running on a 5 year old Desktop PC with 1GB of RAM and an 80GB IDE hard disk, to give you an understanding of how “low spec” i’m talking.

Once you have installed your Linux of choice (i used Ubuntu 9.04 for this operationg but normally i’m firmly a CentOS/RHEL man), you will need to install Nagios. This is very simple, all you need to do is follow the instructions on the Nagios website and ensure that all your dependencies are met during make, make install etc.

Once you have Nagios configured, and you can access it via http://itsipaddress/nagios using nagiosadmin as the user and [whatever] as the password you set a second ago, you should be greeted with a Nagios web page. This is now your working Nagios solution.

The best way to import hosts to monitor into Nagios is via the config files, therefore you will need to do large amounts of this through SSH/terminal to make sure its done correctly (no surprise there).
Nagios (on my Ubuntu 9.04 server) by default is stored at /usr/local/nagios. Within here there are many folders, but the one we are interested in is the “/etc” folder. In here, the config files or “.cfg” files are stored. The big config file we need to take great care of is the nagios.cfg file as this is the master who controls all. If you open it up with vi “vi nagios.cfg”, you will see lots of options, most of which are commented out, i.e.

;/objects/windows.cfg

To enable the monitoring of Windows servers, you will need to uncomment this option (press “i” to begin editing, press ESC then colon followed by wq! and enter to save the file). To reload Nagios (on ubuntu in our case), use Service nagios restart. This is the same in CentOS.

Within the folder “objects” there is a file called “windows.cfg”, this is what the above uncommenting refers to. By adding hosts to this file, you are adding hosts which are going to be monitored. If you open it up with vi again, you will see an example host template.

Bit of advice here, if you are going to monitoring a lot of hosts (servers to you and me), you dont want to add them all into the windows.cfg file as it will get VERY messy very quickly, so it is best to keep the Windows.cfg file to a bare minimum and use a seperate cfg file per server which i will explain later. In the Windows.cfg, you should remove everything except:

# Define a hostgroup for Windows machines
# All hosts that use the windows-server template will automatically be a member of this group

define hostgroup{
hostgroup_name  windows-servers ; The name of the hostgroup
alias           Windows Servers ; Long name of the group
}

This just sets the hostgroup that will be used later during the addition of other Windows servers. You will see in Nagios.cfg (/usr/local/nagios/etc/nagios.cfg) that the line /usr/local/nagios/etc/objects/windows.cfg relates directly to the one we just edited. By adding .cfg files per server and merely adding another look-up line, ala the above, you can add as many servers as you want without cluttering your organisational structure. For example, i have the directory “Servers” within objects, within which i store all my server configs, i then link them back into Nagios.cfg so the files are read and the hosts processed, like so:

# You can specify individual object config files as shown below:

cfg_file=/usr/local/nagios/etc/objects/commands.cfg

cfg_file=/usr/local/nagios/etc/objects/contacts.cfg

cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg

cfg_file=/usr/local/nagios/etc/objects/templates.cfg

# Definitions for monitoring the local (Linux) host

#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

# Definitions for monitoring a Windows machine

cfg_file=/usr/local/nagios/etc/objects/windows.cfg

cfg_file=/usr/local/nagios/etc/objects/Servers/Accountants.cfg

cfg_file=/usr/local/nagios/etc/objects/Servers/SystemX.cfg

#cfg_file=/usr/local/nagios/etc/objects/Servers/SamPC.cfg


The files “SystemX.cfg”, “SamsPC.cfg” and “Accountants.cfg” are all servers which i set up Nagios to monitor, created a seperate cfg file for and tied it into Nagios. Its that simple.

Now onto the actual config files themselves.

As mentioned before, each host you want to monitor should have its own config file, to make it managable and easy to add/remove extra hosts without risk of damaging others. A sample host configuration file is as follows:

#
# Client Y’s Server – Configuration file
#

define host{
use                 windows-server
host_name           ClientServer
alias               Client Y’s Server
address             194.168.4.100
}

define service{
use            generic-service
host_name        ClientServer
service_description    Uptime
check_command        check_nt!UPTIME
}

define service{
use            generic-service
host_name        ClientServer
service_description    CPU Load
check_command        check_nt!CPULOAD!-l 5,80,90
}

define service{
use            generic-service
host_name        ClientServer
service_description    Memory Usage
check_command        check_nt!MEMUSE!-w 80 -c 90
}

define service{
use            generic-service
host_name        ClientServer
service_description    C:\ Drive Space
check_command        check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}

define service{
use            generic-service
host_name        ClientServer
service_description    Explorer.exe
check_command        check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
}

The first stanza refers to the actual host we are setting up to monitor. The “use” line refers to the hostgroup we created / left created in windows.cfg. The host_name is a name given to the server we wish to monitor, it can contain spaces but no special characters, i.e. “Client server” is fine, “Client(Server)” is not. The alias is a string field which is used to identify the server but is not used for calculations in the lookup for the rest of the config. The important field is the “Address” field, which refers to the external IP address of the client’s server you wish to monitor. This is the internet IP of the site who you forwarded port 12489 through to the server, in our example, we are showing as 194.168.4.100 on the website www.whatismyip.com . It is imperative this is correct.

Once the host has been defined, you can create as many “services” as you like. In our example, i have set monitoring on the “NSClient++” version (the executable which runs on the Windows server), the Uptime of the server, the CPU and Memory usage of the server (notice the -w and -c flags, referring to -Warning and -Critical (yellow and red fields in the web interface)).

I am also monitoring disk space usage on C:/, along with the state of the explorer.exe program – this can be changed to anything you like, such as:

define service{
use            generic-service
host_name        ClientServer
service_description    SQL Server
check_command        check_nt!PROCSTATE!-d SHOWALL -l Sqlservr.exe
}

Which will monitor the status of the SQL Server process, etc. Once the configuration files are done and the ports are forwarded, you can now start the Nagios service on your server, by typing “service nagios restart” or “service nagios start” in the command line. You can access the web interface by “http://ipaddressofserver/nagios” and using the username “nagiosadmin” and the password you set earlier. Now all you need to do is configure the NSC.ini file and the NSclient++ on the server(s) you wish to monitor.

For ease, it is advised that before hand you find out the IP address of the site on which the Nagios server is located. You can do this again by going onto www.whatismyip.com .

Now, go onto the server you wish to monitor and open up a web browser such as Firefox and go to the URL:

http://sourceforge.net/project/downloading.php?group_id=131326&filename=NSClient%2B%2B-0.3.6-Win32.msi

Install the program using the defaults as they will all change during the configuration anyway. Once the program is installed, go to “Start -> Run ->” and type services.msc and press enter. Find the service NSClient++ and double click on it. On the second tab along “Log On”, check the box “Allow service to interact with Desktop” and click apply. Now go to C:/Program Files/NSClient++ and edit the file NSC.ini. This is where we need to make the final configurations in order to ensure the host can talk properly with the Nagios box.

Once you are in editing mode, remove the semi-colons / uncomment all the .dll files except “CheckWMI.dll” and “RemoteConfiguration.dll”. Next, scroll down and edit the “allowed_hosts=” field to point to the IP address of the site the Nagios server is on (we found it out earlier). Once this has been done, click save. Now, go to services.msc again and make sure that “NSclient++” is started.

If it is, you should now go to “http://serverip/nagios”, log in, and be able to see the host you have just added. If you click on the traffic lights next to the host, you should be able to see the service health status, green through amber all the way to red. This allows you monitor the health of remote servers quickly and easily. Next post, i will talk about how we can get it to work with multiple servers behind the same external IP using non-port forwarding.

Sam