So recently I had temperature issues with my server; long story short – my fan controller molex fell out and thus my server got rather warm rather quickly – oops!

Problem rectified easily, however i wanted to add some more depth to my monitoring of my server temperatures with Opsview. To do this, i used lm_sensors to get the temperatures, which i can then turn into service checks (check the site for a blog on how to do this).

The problem i had however, was that there were 2 ‘temp1’ sensors, and it wasnt obvious what these were:

root@server:/media# sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +36.0°C (high = +82.0°C, crit = +100.0°C)
Core 1: +35.0°C (high = +82.0°C, crit = +100.0°C)
Core 2: +39.0°C (high = +82.0°C, crit = +100.0°C)
Core 3: +34.0°C (high = +82.0°C, crit = +100.0°C)

it8718-isa-0290
Adapter: ISA adapter
in0: +1.28 V (min = +0.00 V, max = +4.08 V)
in1: +1.86 V (min = +0.00 V, max = +4.08 V)
in2: +3.25 V (min = +0.00 V, max = +4.08 V)
+5V: +2.88 V (min = +0.00 V, max = +4.08 V)
in4: +0.64 V (min = +0.00 V, max = +4.08 V)
in5: +0.08 V (min = +0.00 V, max = +4.08 V)
in6: +0.11 V (min = +0.00 V, max = +4.08 V)
in7: +3.07 V (min = +0.00 V, max = +4.08 V)
Vbat: +3.28 V
fan1: 1268 RPM (min = 0 RPM)
fan2: 0 RPM (min = 0 RPM)
fan3: 1962 RPM (min = 10 RPM)
fan4: 0 RPM (min = 10 RPM)
temp1: +39.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
temp2: +29.0°C (low = +127.0°C, high = +60.0°C) sensor = thermal diode
temp3: -2.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
intrusion0: ALARM

nouveau-pci-0100
Adapter: PCI adapter
fan1: 0 RPM
temp1: +63.0°C (high = +95.0°C, hyst = +3.0°C)
(crit = +115.0°C, hyst = +2.0°C)
(emerg = +130.0°C, hyst = +10.0°C)

It turns out renaming these sensors is rather easy! Firstly, copy the name of the chip that the sensors are running on – in my case, i wanted to rename ‘temp1 and temp2’ from it8718-isa-0290 to ‘DIMM1’ and ‘DIMM2’ – so to do this, i added a new file in /etc/sensors.d/ called ‘mobo’ (you can call it anything you like), and in here i added the following lines:

root@server:/media# cat /etc/sensors.d/mobo
chip "it8718-isa-0290"
label temp1 "DIMM1Temperature"
label temp2 "DIMM2Temperature"

Now, when I run ‘sensors’ i get the correct output:

it8718-isa-0290
Adapter: ISA adapter
in0:                +1.28 V  (min =  +0.00 V, max =  +4.08 V)
in1:                +1.86 V  (min =  +0.00 V, max =  +4.08 V)
in2:                +3.25 V  (min =  +0.00 V, max =  +4.08 V)
+5V:                +2.88 V  (min =  +0.00 V, max =  +4.08 V)
in4:                +0.64 V  (min =  +0.00 V, max =  +4.08 V)
in5:                +0.08 V  (min =  +0.00 V, max =  +4.08 V)
in6:                +0.11 V  (min =  +0.00 V, max =  +4.08 V)
in7:                +3.07 V  (min =  +0.00 V, max =  +4.08 V)
Vbat:               +3.28 V
fan1:              1268 RPM  (min =    0 RPM)
fan2:                 0 RPM  (min =    0 RPM)
fan3:              1962 RPM  (min =   10 RPM)
fan4:                 0 RPM  (min =   10 RPM)
DIMM1 Temperature:  +39.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
DIMM2 Temperature:  +29.0°C  (low  = +127.0°C, high = +60.0°C)  sensor = thermal diode
temp3:               -2.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor

And thats that – now when i run my checks via Opsview i can be sure im getting the temperatures from my DIMM’s and not from a northbridge sensor or something else:

root@server:/home/sam# sudo /usr/local/nagios/libexec/check_lm_sensors-3.1.1/check_lm_sensors --sanitize --high DIMM1Temperature=70,85
LM_SENSORS OK - DIMM1Temperature=39.0|DIMM1Temperature=39.0;70;85;;
root@server:/home/sam# sudo /usr/local/nagios/libexec/check_lm_sensors-3.1.1/check_lm_sensors --sanitize --high DIMM2Temperature=70,85
LM_SENSORS OK - DIMM2Temperature=29.0|DIMM2Temperature=29.0;70;85;;

Cool eh..