Recently i’ve been on a bit of a tear with my infrastructure, moving from Apache to Nginx and migrating to new hardware (I moved from my beloved 25KG Fractal Define XL to a new mATX box that is 25% the size.. i call it ‘wife friendly infrastructure’!).
In my infrastructure of many ridiculous things, I use Opsview to monitor server temperatures (CPU/HDD/RAM), free space on my logical volumes, SMART status, RAID status and a few other things (systemd service status, etc). I then use Splunk Light to parse and display information gathered from logs for my web applications: ownCloud, Opsview, etc and also the logs forwarded from my router which handles port forwarding into the LAN (so i can see all the naughty port scanners..tsk tsk).
One thing I was always curious about was how could I get Splunk to analyse and interpret data generated by the Nagios (c) or Monitoring Plugins ran by software such as Opsview, Nagios, Icinga 2, or pretty much any monitoring tool out there.
This blog will show you at a basic level how you take the plugin-secret sauce and smother your Splunk in it, so you can analyse everything! (oh and how to create some funky graphics, because, I like colours.. mmm pretty).
1. Configuration
Firstly, lets look at the command line. Scary I know.
Splunk scripts can be stored in one of four locations (as per the UI):
Now on my server, $SPLUNK_HOME is “/opt/splunk” (if you are lazy, download ‘locate’, run ‘updatedb’ and then run ‘locate “/bin/scripts” to find the path).
Splunk expects the scripts you want to run (Nagios / Monitoring plugins) to be ‘in’ these directories. In Opsview and Nagios variants, the plugins live in /usr/local/nagios/libexec – so you can either copy the scripts from B to A, or you can do the preferred option and symlink those badboys:
root@server:/home/sam# cd /opt/splunk/bin/scripts/ root@server:/opt/splunk/bin/scripts# ls total 1MB -r--r--r-- 1 splunk splunk 1MB Dec 9 12:07 readme.txt drwxr-xr-x 4 splunk splunk 1MB Jan 15 12:09 .. drwxr-xr-x 2 splunk splunk 1MB May 19 20:20 . root@server:/opt/splunk/bin/scripts# ln -s /usr/local/nagios/libexec/check_raid . root@server:/opt/splunk/bin/scripts# ls total 1MB -r--r--r-- 1 splunk splunk 1MB Dec 9 12:07 readme.txt drwxr-xr-x 4 splunk splunk 1MB Jan 15 12:09 .. lrwxrwxrwx 1 root root 1MB May 19 19:41 check_raid -> /usr/local/nagios/libexec/check_raid drwxr-xr-x 2 splunk splunk 1MB May 19 20:20 . root@server:/opt/splunk/bin/scripts#
This gives you the benefit of only one version of the plugin on the system, so your results will be the same between monitoring tools (you never know!) and also it makes maintenance a lot simpler.
Now that the plugins are ‘in’ the Splunk directory, you can begin to work with them at the GUI level.
2. Data inputs
On your Splunk system, navigate to ‘Data Inputs’ via the hamburger menu icon in the top left:
Within this section you can choose to add ‘Files and Directories’, TCP/UDP ports, and also SCRIPTS! That looks interesting, right? Lets click on it.
In here you will see all of your existing scripts that Splunk is collecting and parsing the output of, and also a big, ominous ‘New’ button, deceptively named as thats what we need to click on to add a new script input. Crazy right? *click*
In the first step, we need to give Splunk the source. Click on the ‘Script path’ dropdown and choose ‘bin/scripts’, then select the plugin you want to configure.
In the example below, I am going to show you the configuration needed to take ‘check_lm_sensors’ data (temperature data grepped from the ‘sensors’ command output) and add it into Splunk. I’ll be using the temperature of my /dev/sda hard drive as an example.
The command must be as it would be ran on the command line, i.e. on the CLI I can run:
./check_lm_sensors --sanitize --high sdaTemp\=55,65
Therefore in the UI ‘Command:’ section I need to enter:
$SPLUNK_HOME/bin/scripts/check_lm_sensors --sanitize --high sdaTemp\=55,65
Next, as I want somewhat granular data im going to run this plugin every 60 seconds (i.e. get a temperature data plot every 60 seconds). Finally, give the ‘Source name override’ a unique value, i.e. ‘sdaTemp’. Next!
Next we need to classify the data. I took a simple approach to the temperature based data I’m gathering and simply called it ‘nagiosplugin’. After configuring the source type, click ‘Review’ and then complete, after which you will be presented with the following screen:
The best thing to do from here is click on that ‘Start searching’ button which will take you to a prefiltered view which you can begin to parse and filter further.
3. Extract
Firstly, have a nosy at the log data generated:
In order to create pretty pictures from this data, we need to parse it and turn the temperature value into a custom field. To do this, click on ‘Extract New Fields’ in the bottom left:
Then click on an example row of data and click next (ignore the colours on mine, i’ve already done these steps):
In the next screen, click on ‘Regular Expression’ and click next. On the next screen, drag your mouse over the temperature information after which a popup will appear, asking for a field name:
Select the ’30’ and call it temperature, and select the ‘sdaTemp’ and call it ‘temperaturesource’, and then click ‘Next’, ‘Next’ and ‘Finish’.
4. Create reports
Now that we have our parsed data, lets open up a search and begin to carve it up. To get the ‘latest’ value, use the ‘stats’ filter as below:
index=* sourcetype=nagiosplugin host=server sdaTemp | stats latest(sdaTemp) as sdaTemp
This will show you a single number as below:
We can then graph this by adding the next filter ‘| gauge sdaTemp’, giving us:
We can then pimp this up by changing the format using the ‘Format’ drop down, adding a label, background colour range (i.e. 0-45oC is green, 45 to 60 is yellow, 60+ is red), and more. Finally, click ‘Save as’ in the top right, and select ‘Report’:
Once saved, you report will look something similar to mine below. You can view all of your reports via the ‘Reports’ menu option at the top:
5. Create dashboards
Finally, to bring it all together, click on the ‘Dashboards’ tab at the top of your Splunk GUI, and once loaded click ‘Create new dashboard’ which will prompt for a name and other options.
We are going to create a simple dashboard, by adding all of our reports onto a single screen (think of reports as widgets or dashlets). To do this, select ‘Add Panel > New from report’ as below:
Simply click on your report and click ‘Add to dashboard’ and voila, its added! Now repeat this step X number of times depending on how many reports you have, and before you know it you’ve got your own Nagios-plugin data-based dashboard (Thats a mouthful):
You can also graph these values historically using timechart and the search query:
index=* sourcetype=nagiosplugin host=server *Temp | timechart span=1m avg(temperature) by temperaturesource
Which gives us:
Now, go forth and blend Nagios plugins into your log based world.
PS: Update to the blog, I found out that since writing this blog you can get sparklines on the ‘number’ reports above, by using ‘timechart’ as the query and selecting ‘Single value’ in the drop down, as below. This allows you to select ‘Show sparkline: Yes” in the ‘Format’ section. Very cool!