Hello all,
No I am not dead – I have just moved into management… I’ll let you come up with the jokes!
Today I’m going to write a technical document on how to monitor the age of a file to ensure that it is newer than a certain criteria – i.e. make sure that file ‘X’ is newer than ‘5 days’ for example. This came up during my day as I wanted to make sure that my diary that I use at home (running on WordPress) is backed up to a remote location successfully once a week – so it pays to be in monitoring today!
Crontab
Firstly, I setup my crontab entry:
[root@rhelserver log]# crontab -l 0 23 * * 0 mysqldump --single-transaction -u sam -p wpblog --password=removed > "/media/nfs2/Backups/Diary/Blog-$(date '+%Y%m%d').sql.gz" 0 23 * * 0 echo "Backup completed" > /var/log/diary-backup [root@rhelserver log]#
Here we are essentially running a mysqldump against the MySQL DB that is running my wordpress installation (wpblog), and storing it on a remote NFS mount point as a .gz file, with a date modified file name (so i can roll back if needed).
Also, I am creating a new file in /var/log called ‘diary-backup’ – why? Because my plugin will be executed by the nagios user, and i dont really want to give it access to my nfs2 share (Plus, it is a hassle that i dont have time to play with) – so i’m creating a file in /var/log that im going to chmod 755, so that nagios can access it and scrutinize the file age -which, as the file is created after the backup job – will be a real world representation of the .gz file created.
Opsview plugin
For this exercise, I used the ‘check_file_age’ plugin that ships with Opsview – however the standard output was rather annoying and not very humanised – for example:
root@opsview-monitor:/usr/local/nagios/libexec# ./check_file_age -w 691199 -c 691200 /home/sam/.bash_history FILE_AGE OK: /home/sam/.bash_history is 425197 seconds old and 434 bytes
This isnt very useful to me – as I am not a computer and cant work out if 425,000 seconds is a good thing or a bad thing 🙂 So, i modified the check_file_age plugin using the help of this guide here – http://www.krzywanski.net/archives/429 – essentially, replace the line:
print "FILE_AGE $result: $opt_f is $age seconds old and $size bytes\n";
with
my $days = $age/86400; $days = sprintf("%.1f", $days); print "FILE_AGE $result: $opt_f is $age seconds ($days days) old and $size bytes\n";
So that we output ‘days’ instead of seconds. So, next I tested my command locally on my wordpress server:
[root@rhelserver log]# su - nagios [nagios@rhelserver ~]$ cd /usr/local/nagios/libexec/ [nagios@rhelserver libexec]$ ./check_file_age -c 691200 /var/log/diary-backup FILE_AGE OK: /var/log/diary-backup is 961 seconds (0.0 days) old and 17 bytes
(If your curious, 691200 seconds is 8 days). So here we can see, the nagios user has access to the file in question – and we are getting data in a usable format i.e. days, not seconds.
Next, we need to create the NRPE entry, so this ^^ command can be executed remotely by the Opsview monitoring server. Doing this is very simple – just add a line similar to the below in your /usr/local/nagios/etc/nrpe_local/overrides.cfg file (if this doesnt exist, just create one):
nagios@rhelserver libexec]$ tail -n1 /usr/local/nagios/etc/nrpe_local/override.cfg check_command[diary_backup]=/usr/local/nagios/libexec/check_file_age -c 691200 /var/log/diary-backup
The ‘diary_backup’ element is the command we will be executing from Opsview. Finally, give the opsview-agent a bounce to apply the changes:
[nagios@rhelserver libexec]$ exit logout [root@rhelserver log]# /etc/init.d/opsview-agent restart NRPE stopped NRPE started [root@rhelserver log]#
We can now test this locally:
root@rhelserver log]# cd /usr/local/nagios/libexec/ [root@rhelserver libexec]# ./check_nrpe -H localhost -c diary_backup FILE_AGE WARNING: /var/log/diary-backup is 1272 seconds (0.0 days) old and 17 bytes [root@rhelserver libexec]#
Voila, its working.
Bring it all together in the GUI
So lastly, we need to login to the Opsview GUI and bring this all together. Firstly, create a new service check with the plugin as ‘check_nrpe’ and the arguments as ‘-H $HOSTADDRESS$ -c diary_backup’. Then, add this to your host (wordpress server in my example). Finally, give it a reload and it will now be running and monitoring your backup:
There are then hundreds of things you can do – for example be notified when it goes critical or warning (ignore the warning above, i didnt set a -w flag, whoops) – or show it in a keyword (Monitoring > Keywords) as i have done at home:
Conclusion
So there you have it – i am now monitoring my Diary backup cronjob to make sure it completes every week using Opsview. You can use this for anything – logs, files, logins, you name it. Happy hunting!