Building on my previous blog on SLA monitoring with Nagios / Opsview – http://www.everybodyhertz.co.uk/host-group-availability/ – which looks at measuring availability at a “group” level – where a group is a customer, department, etc  – I next want to look at another feature that a lot of MSP’s will want to know in ways different to their standard view – How many devices per customer? Or in other words; how many devices should i charge the customer for, or flipping it on its head – at the end of the month, how much is my bill going to be?

At the moment in Opsview we can look at the number of hosts in a host group using a view similar to below:

Which is all well and good – we can see we have 6 “Databases” (Oracle, MySQL, etc), and 3 devices that live in the “Customers” host group. However we cant alert based upon these values, nor can we graph on them, etc. Until now! *dun dun dunnnn*.

Using my highly limited development skills (asking people “How do you do this?”) i’ve created a plugin called check_opsview_hostcount – that takes in the host group and outputs the number of hosts that live in that group, in nagios ‘perf data’ format (This is important as we can then use the output in a graph). This means we can now graph, report, etc based upon the host group total.

What we can also do, is the “warning/critical” flags (-w/-c) to alert us when we are over a certain number. This can be particularly useful if you want to know that your MSP will charge you twice as much if you go over 1000 hosts, or something. By using this new plugin, you can be alerted!

Bringing it together

So a few cool things we can do here now because we have this plugin.

1. Show host counts as service checks

By using the “Multiple attributes” option in Opsview, i can create one service check called “Hosts:” and have it use a syntax of:

check_opsview_hostcount --hostgroup "%HOSTGROUPATTR%"

So that each time I add an attribute to the host this check is applied against, a new service check is already added. What does this look like? Well i only created one check, and I only apply to my host once (Hosts – as below):

However each time i add an attribute, it will create a new check. I.e. because I have added 7 attributes to my “dummy host” of the type “HOSTGROUPATTR” with the values of my host groups:

I now get 7 checks created as below (this will work with any service check scenario, on any version of Opsview):

This makes my setup very neat and tidy. Each time i want to modify the check, I edit it once, not 10000 times.

2. Show my results in a pie chart

Here we can see the split of “customers vs. total” – so if we have one customer using up a lot of your host count, then maybe its time to offload them to a seperate system / dedicated server, etc.

3. Use performance gauges

One of my personal favourites is the performance gauge. Here we can set our thresholds so that if a customers host count goes above 8 its a warning, and above 9 its a critical. This allows us another great “at a glance” view into our operations:

4. Bring it all together

Finally, we can bring together our:

  • Hosts count – How many devices are we monitoring for each customer?
  • Host group SLA – Whats the availability for our customer, for the duration given (1 week, 2 weeks, etc?
  • Pie Chart – Of our 100 monitored devices, how many are from a single customer? Also, what did our customer breakdown look like 1 month ago, 1 year ago, etc.

Conclusion and notes

So there we have it. We can now monitor SLA’s, look at customer device count and alert on it if it gets too high, and display it graphically, use in reports, etc. I think these are really rather useful tools for MSP’s to have in their arsenal.

Note: If you want a copy of the plugin, feel free to leave a message / comment on here, or connect with me on LinkedIn using the “social button” on the left hand side. Ta!