  | |  | LDAP crashing | LDAP crashing 2005-06-02 - By Tim Edwards
Back Hi,
We've had a long series of problems with our main network server and LDAP. It all started on the 5th may when this server ('app') became unresponsive and appeared to have hung. By this I mean that the services on it weren't responding across the network and pings or ssh didn't work. When trying to login on the console the system would just hang for ages after entering the password (trying to login as root). Eventually the console would reset back to the login prompt IIRC. It didn't even respond to a ctrl+alt+del. Eventually it had to be rebooted by the power switch and it came up normally and things appeared to be working.
The curious thing about all this was that it would degrade over a few minutes - at first anything do to with user IDs would hang, eg. doing a ps -ef would hang whereas running ls for example would behave normally. It would then appear to hang completely with the symptoms described above. This is what made me think of LDAP.
This same thing happened once every morning for the next 3 mornings (including a saturday). On the following monday it was rebooted but appeared the problem had got worse - the problem started recurring approx every hour that morning. In desperation (this was really disrupting everone's work) I stopped LDAP and moved it to a completely seperate machine. I then changed the ldap entry in our DNS to point to the new machine. After that everything (including app) worked perfectly (for over 3 weeks).
This week we created an LDAP slave on another machine and moved the LDAP master server back to the app server. We immediately began to have very similar problems - anything to do with user ids or relying on authentication wouldn't work for several minutes, then come good, then stop working again. It didn't crash the machine outright this time, and I have a suspicion that this is to do with the latest kernel update (2.4.21-32) which was only 2.4.21-27 when we had the problems previously.
I changed our DNS to only point to the LDAP on the slave server and once again everything is running perfectly. In fact I've left LDAP running and replicating from the app server so that I can make changes to user's details if needed, but so far this hasn't caused any hangs or crashes on that server.
Any ideas what could be causing this? Both boxes are RHEL3 with the same level of updates and very similar configs and same hardware.
Thanks -- Tim Edwards
-- Taroon-list mailing list Taroon-list@(protected) http://www.redhat.com/mailman/listinfo/taroon-list
Earn $52 per hosting referral at Lunarpages.
|
|
 |