So recently I’ve again been plagued by this mother-of-all NRPE messages:
NRPE: Unable to read output
Whilst trying to get the afore-blogged landscape_monitor.pl script to run via NRPE. NRPE for those that arent familiar is Nagios Remote Plugin Executor, and it is used client-side to call scripts on a server/end-point. For example, I have check_nrpe on my Opsview master, it connects to a server which has a tape drive running BackupExec jobs each night.
Using NRPE, i can run a script on that local server, get the output “All is good” or “Everythings gone to hell”, and have this output sent back to my Opsview master for collation, alerting, etc. This is different to the usual polling approach of standard monitoring as we think of it.
In my example, i wanted to use check_nrpe from the client side to connect to my Opsview master, and run landscape_monitor.pl with 2 variables – the clients hostname, and a performance metric i.e. load average, db connections, etc.
To do this, we have to do the following (this is true for any NRPE command):
1. On the system we wish to run the local command, i.e. check_landscape.pl, we have to (after installing NRPE / Opsview Agent) create a file in /usr/local/nagios/etc/nrpe_local. This can be called anything; i like to call mine overrides.cfg however.
2. In this file, we have to add our entries specifying the name of our command (what will be called remotely), which script will be ran, and any variables that will be passed to it, as below:
#########################################################
# Additional NRPE config file Opsview
#########################################################
check_command[get_rest]=/usr/local/nagios/libexec/landscape_monitor.pl $ARG1$ $ARG2$
As you can see, we have called our command “get_rest” – and this command calls our landscape_monitor.pl script living in libexec, with 2 arguements (ARG1 and ARG2: Note, they must be dollar-wrapped).
3. Next, open up the nrpe.cfg and ensure you have the following lines in there somewhere (in the Opsview agent they are enabled by default):
dont_blame_nrpe=1
include_dir=/usr/local/nagios/etc/nrpe_local
These basically say “allow arguments with NRPE”, and “look in /usr/local/nagios/etc/nrpe_local for .cfg files containing new commands”.
4. And that is pretty much all the configuration we need to do on the local side (if your running iptables dont forget to allow TCP 5666 inbound!).
5. Next, run the command we are going to be calling (landscape_monitor.pl) just to make sure it actually works:
nagios@ip-10-36-193-144:~$ cd /usr/local/nagios/libexec/ nagios@ip-10-36-193-144:/usr/local/nagios/libexec$ ./landscape_monitor.pl opsview load1 0.01
nagios@ip-10-36-193-144:/usr/local/nagios/libexec$
As we can see – we get a value out, 0.01 – this is load1 for the host “opsview”. So the script is working – huzzah. Next, we need to test calling this from check_nrpe:
nagios@ip-10-36-193-144:/usr/local/nagios/libexec$ ./check_nrpe -H localhost -c get_rest opsview load1 NRPE: Unable to read output
Uh-oh, what is this?
This is where I descended in madness – trying to figure out why on earth this wasnt working – we have defined ARG1 and ARG2, the script runs as nagios, “check_nrpe -H localhost” works, etc. After sifting through strace, I stumbled across:
13433 08:48:27.390359 execve("/bin/sh", ["sh", "-c", "/usr/local/nagios/libexec/landscape_monitor.pl "], ["LESSOPEN=| /usr/bin/lesspipe %s", "SUDO_GID=1000", "MAIL=/var/mail/root", "USER=root", "SHLVL=1", "HOME=/root", "SUDO_UID=1000", "LOGNAME=root", "_=/usr/bin/strace", "TERM=xterm", "USERNAME=root", "PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin", "LANG=en_US.UTF-8", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.d"..., "SUDO_COMMAND=/bin/su", "SHELL=/bin/bash", "SUDO_USER=ubuntu", "LESSCLOSE=/usr/bin/lesspipe %s %s", "PWD=/home/ubuntu", "NRPE_MULTILINESUPPORT=1", "NRPE_PROGRAMVERSION=2.14"]) = 0 13433 08:48:27.391616 brk(0) = 0x1e0e000
And as we can see – we dont have any arguments there – puzzling! I went back to landscape_monitor.pl and ran it without arguments, and it failed also. So it seems that the arguments are getting to the script – even though in nrpe.cfg we said explicility allow arguments.
The answer is blindingly simple (and very annoyingly so!) – we simply need to add an “-a” before our arguments!
If i run my check_nrpe again, with -a as below, it works:
nagios@ip-10-36-193-144:/usr/local/nagios/libexec$ ./check_nrpe -H localhost -c get_rest -a opsview load1 0.09
It really is that simple! So next time you are having issues, check:
1. If your using arguments, that you have specified the -a.
2. If you are not using arguments, try and run the script locally first – as nagios.
3. Check iptables – TCP 5666.
Overall -a very powerful framework, but its error messages arent the best! 🙂