Hello all,

No I am not dead – I have just moved into management… I’ll let you come up with the jokes!

Today I’m going to write a technical document on how to monitor the age of a file to ensure that it is newer than a certain criteria – i.e. make sure that file ‘X’ is newer than ‘5 days’ for example. This came up during my day as I wanted to make sure that my diary that I use at home (running on WordPress) is backed up to a remote location successfully once a week – so it pays to be in monitoring today!

Crontab

Firstly, I setup my crontab entry:

[root@rhelserver log]# crontab -l
0 23 * * 0 mysqldump --single-transaction -u sam -p wpblog --password=removed > "/media/nfs2/Backups/Diary/Blog-$(date '+%Y%m%d').sql.gz"
0 23 * * 0 echo "Backup completed" > /var/log/diary-backup
[root@rhelserver log]#

Here we are essentially running a mysqldump against the MySQL DB that is running my wordpress installation (wpblog), and storing it on a remote NFS mount point as a .gz file, with a date modified file name (so i can roll back if needed).

Also, I am creating a new file in /var/log called ‘diary-backup’ – why? Because my plugin will be executed by the nagios user, and i dont really want to give it access to my nfs2 share (Plus, it is a hassle that i dont have time to play with) – so i’m creating a file in /var/log that im going to chmod 755, so that nagios can access it and scrutinize the file age -which, as the file is created after the backup job – will be a real world representation of the .gz file created.

Opsview plugin

For this exercise, I used the ‘check_file_age’ plugin that ships with Opsview – however the standard output was rather annoying and not very humanised – for example:

root@opsview-monitor:/usr/local/nagios/libexec# ./check_file_age -w 691199 -c 691200 /home/sam/.bash_history
FILE_AGE OK: /home/sam/.bash_history is 425197 seconds old and 434 bytes

This isnt very useful to me – as I am not a computer and cant work out if 425,000 seconds is a good thing or a bad thing 🙂 So, i modified the check_file_age plugin using the help of this guide here – http://www.krzywanski.net/archives/429 – essentially, replace the line:

print "FILE_AGE $result: $opt_f is $age seconds old and $size bytes\n";

with

my $days = $age/86400;
$days = sprintf("%.1f", $days);
print "FILE_AGE $result: $opt_f is $age seconds ($days days) old and $size bytes\n";

So that we output  ‘days’ instead of seconds. So, next I tested my command locally on my wordpress server:

[root@rhelserver log]# su - nagios
[nagios@rhelserver ~]$ cd /usr/local/nagios/libexec/
[nagios@rhelserver libexec]$ ./check_file_age -c 691200 /var/log/diary-backup
FILE_AGE OK: /var/log/diary-backup is 961 seconds (0.0 days) old and 17 bytes

(If your curious, 691200 seconds is 8 days). So here we can see, the nagios user has access to the file in question – and we are getting data in a usable format i.e. days, not seconds.

Next, we need to create the NRPE entry, so this ^^ command can be executed remotely by the Opsview monitoring server. Doing this is very simple – just add a line similar to the below in your /usr/local/nagios/etc/nrpe_local/overrides.cfg file (if this doesnt exist, just create one):

nagios@rhelserver libexec]$ tail -n1 /usr/local/nagios/etc/nrpe_local/override.cfg
check_command[diary_backup]=/usr/local/nagios/libexec/check_file_age -c 691200 /var/log/diary-backup

The ‘diary_backup’ element is the command we will be executing from Opsview. Finally, give the opsview-agent a bounce to apply the changes:

[nagios@rhelserver libexec]$ exit
logout
[root@rhelserver log]# /etc/init.d/opsview-agent restart
NRPE stopped
NRPE started
[root@rhelserver log]#

We can now test this locally:

root@rhelserver log]# cd /usr/local/nagios/libexec/
[root@rhelserver libexec]# ./check_nrpe -H localhost -c diary_backup
FILE_AGE WARNING: /var/log/diary-backup is 1272 seconds (0.0 days) old and 17 bytes
[root@rhelserver libexec]#

Voila, its working.

Bring it all together in the GUI

So lastly, we need to login to the Opsview GUI and bring this all together. Firstly, create a new service check with the plugin as ‘check_nrpe’ and the arguments as ‘-H $HOSTADDRESS$ -c diary_backup’. Then, add this to your host (wordpress server in my example). Finally, give it a reload and it will now be running and monitoring your backup:

There are then hundreds of things you can do – for example be notified when it goes critical or warning (ignore the warning above, i didnt set a -w flag, whoops) – or show it in a keyword (Monitoring > Keywords) as i have done at home:

Conclusion

So there you have it – i am now monitoring my Diary backup cronjob to make sure it completes every week using Opsview. You can use this for anything – logs, files, logins, you name it. Happy hunting!