  | |  | System freezing | System freezing 2003-05-01 - By Winston Gutkowski
Back Hi all,
I posted a missive to this forum on the 27th March about problems we were having with our servers freezing up. I include the original text below for anyone who missed it.
Since then I have managed to duplicate the problem on one of our test machines, and had the luck to have both test and production servers go down on the same night at the same time. It would appear that the culprit job is *not* the cron.daily as I first assumed, but a script written by me which runs at very near the same time.
It is meant to create backups of files from the same or another machine by piping the result of a tar to an untar. As stated before, the problem is not consistent, but periodically it locks up the machine.
I enclose the script and its configuration file for anyone who would care to try their teeth on it. I am still mystified as to what the error is. The only explanation that should be needed is the "squeeze" command which creates TAB-delimited "word" columns from input (in this case, the config file).
Original message is below FYI. Anyone out there who can help, I will be eternally grateful.
Thanks
Winston Gutkowski
Bulk of message from March 27th: ... We have 3 generic Intel servers running RH7.2, and 2 of them have major stability problems. They are as follows: 1. firewall - no problems. This is a small (1Gb memory) machine with 2 software RAID 1'd IDE drives, performing firewall and DNS services. 2. Application routing server - constant problems Medium-sized (2Gb memory) machine with 2 software RAID 1'd IDE drives, running Jetty HTTP server serving Java servlets. 3. Database server - intermittent problems Big server (3Gb memory) + Disk Array: 2 onboard and 6 external SCSI disks, all software RAID 1'd, running Jetty HTTP server and Oracle 8.1.7.
In addition to the above mentioned software, all machines also run tripwire intrusion detection software as well.
On both machines that have problems, the system either freezes or crashes just after 4AM, making me think that it may have something to do with the cron.daily jobs. I found some pages on the Web from people who had problems with machines crashing when makewhatis runs, but supposedly the problem was fixed in 7.1. On the database server I would sometimes get messages saying that the system did not have enough memory to execute a fork() shortly before the system froze (it didn't halt; the console merely stopped responding). On the Application server, the system would simply power itself down (badly; as though someone had simply turned it off at the wall) shortly after 4AM every 3rd or 4th day. ...
Earn $52 per hosting referral at Lunarpages.
|
|
 |