Opsview – An Architectural Review

Opsview is a monitoring solution that customers / community users install on servers, virtual machines or in the cloud such as on AWS, to allow monitoring of every aspect of their IT estate from VMware and Oracle, to Microsoft Windows and Cisco networks. This article is intended to give users or prospective users an overview into the design of an Opsview system, and what the terminology means and why we configure it differently in certain situations.

1. Architectural overview of Opsview

In its most basic of forms Opsview consists of a Master server, and a database. The master server does the monitoring (polling, receiving traps, NetFlow, querying via WMI, etc) and returns the data which is stored in the MySQL database. The dashboards, reports, etc then analyse that data and display it for users (alerting aside, etc).

A standalone master can generally monitor around 300 servers (depending on hardware, load, service checks per host, etc) – however sometimes the master server can become “overloaded” (too many hosts being monitored by the one monitoring server), so users can choose to use slave servers to offload this workload onto a secondary piece of hardware – i.e. 250 on the master, 250 on the slave – with the slave monitoring 250 Windows servers, and the Master monitoring 250 Linux servers, for a simple example.

This collection of the “Windows Servers” data is done by the slave server itself (configured via the master server which pushes the changes/instructions out). The slave collects the data, i.e. C:/ usage, CPU usage, service status, etc and passes the results back to the master – so that the master can store the results in the database (ODW), and use them for dashboards, reports, etc. A simple example of a Master server and a slave cluster (same concept but 2 slave nodes in high-availability) and the ‘splitting of hosts’ between monitoring instances can be seen here:

This “passing” of data from the slave servers to the master server is done via an SSH tunnel setup using key exchange, so you can be sure that the results are secure, especially if they are going over a WAN link / the internet. The connection of the slaves to the master server is done via the SSH port (TCP 22) which would need to be opened on the firewall, however with a bit of tweaking we can modify it to use almost any port desired. The slaves can also connect either via a forward SSH tunnel; reverse SSH tunnel, or “Passive”, as outlined below:

2. Potential Architectures

Now that we’ve covered off the terms / technology, we can dive into architecture. Generally, architecture can be split into 1 of 2 scenarios – cloud hybrid or on-premise.

Scenario 1 – Using a cloud provider

We could architect Opsview in the cloud (PaaS/IaaS) using a provider such as Amazon Web Services. In this scenario, we would install the Opsview master in the cloud, (i.e. in a VPC in Amazon AWS) and access it via the internet / WAN. We would then install a slave server on each location, 1 on Customer A and 1 on Customer B, for example, and then have “A Slave” monitor the entire Customer A infrastructure, and “B Slave” monitor all of the Customer B infrastructure. These 2 slaves would then pass back their results via the SSH tunnel to the Opsview master in the cloud, for use in dashboards, reports, alerts, etc. This requires a server on each location, with connectivity to the Opsview Master in the cloud.

Scenario 2 – Using an On-Premise solution

We could also use a scenario similar to above, but avoid the cloud and use entirely on-premise hardware. In this scenario, we would have the Opsview Master at our HQ / Data Centre, and a slave server at Customer A and customer B, and again an SSH tunnel between HQ and the 2 slave servers. This may even require no ports opening depending on the network configuration (VTP across 2 locations, etc).

Again, we would split the server monitoring, so “Opsview Master” monitors the HQ/Data centre servers, and “Customer A/B” slaves monitor the customer A/B hardware respectively- and users would be able to login to the Master on the HQ location via a GUI, and see the data of both sites in the single pane of glass.

Scenario 3 – Using an On-Premise solution without slaves

If we have a large network, or we aren’t allow to install a slave on premise, then the 3^rd scenario we have is to simply open up ports to allow port 5666/5667 into the customers network, and add the hosts directly to our Opsview system using the GUI on the master and connect via the slave. This requires direct connectivity between the Opsview Master and the monitored device via port 5666 (NRPE).

3. Closing Thoughts

There are numerous methods of installing and architecting Opsview – one of which is sure to fit your environment. Providing the basics are understood – such as the Master/Slave concept, how the two connect, etc then you should be up and running and monitoring your entire estate or your entire customer base in next to no time.

The benefits of using a slave as opposed to port forwarding are:

Only one port to forward (SSH)
Only one TCP connection for router to track generally
Ability to run auto discovery from slave server on remote location
Slave servers can offload workload, allowing your monitoring system to scale to large numbers.

Press ESC to close

Share Article:

sam

SaaS – Business advice and research

SLA monitoring with Nagios / Opsview

Leave a Reply Cancel reply