Has anyone ever seen a load average like that on a Linux box? Our central mail servers sufferred under a staggering load from 22:30 CET until 07:25 CET this morning, when I had to reset them.

It turns out the Exim servers where flat out trying to contact an LDAP server that had crashed and rebooted itself via a hardware watchdog. To be fair, it wasn't really the Exim MTA but rather NSS LDAP that hadn't noticed the dead connection via the load balancer. It is a bummer that the OpenLDAP libraries don't time out on operations...

Comments

blog comments powered by Disqus