Author Login
Post Reply
Matthew Rich wrote:
> Hello all,
>
> I have a 2-machine cluster of RHEL 5.1 servers (Dell PE 2850s) that
> serve up a whole bunch of web sites. One of them is the primary web
> server, svn repository, and mysql slave, the other is the mysql master
> and backup web server (heartbeat service is used to fail over web
> traffic if the primary web server goes down, and a homebrew solution is
> used to fail over mysql).
>
> 3 times in the last two weeks, the main web server has crashed. I
> suspected hardware until this afternoon, when after disabling what I
> thought was the offending piece of hardware in the BIOS (i'd seen a few
> hda-related errors in /var/log/messages a few weeks back and disabled
> the IDE controller last night after its 3rd crash) the machine stayed up
> all night while the other server -- the primary database server, which
> was now serving up our websites as well -- crashed in the same manner
> today just after noon.
>
> What happens is the "crashed" machine remains pingable but does not
> respond to any other requests, and at the console it only prints memory
> status messages every few seconds but doesn't allow logins (IE it looks
> like all processes are stopped). On reboot everything looks hunky dory
> and there's nothing out of the ordinary in /var/log/messages or any
> other log that I could think to examine.
>
> The interesting thing is that the primary database / backup web server
> had never crashed in this manner before today around noon, and it took
> over the web serving last night around 10pm. So it appears to me that
> apache is a likely culprit, since before it just happily ran mysqld
> without any problems.
>
> I apologize if this isn't the right forum to ask about this, but I don't
> believe it to be hardware-related, and since the problem suddenly began
> occuring 2 weeks ago and has occurred with increasing frequency since
> (once two weeks ago, 2 times yesterday, and now once already today) on
> two RHEL 5.1 machines running apache, I thought maybe a RHEL update
> might be at fault. I'm a pretty inexperienced linux admin and so far
> just looking through logs hasn't turned up anything.
>
> Has anybody else seen anything like this? Any pointers on where to begin
> looking?
>
I have, most recently it was on Debian, and the problem was a cronjob.
Cron is a handy tool, but some things that get run (eg dbupdate) can be
resource intensive, and I have known that one in particular to run over
24 hours.
Having top running can be helpful. So can a cronjob (:-)) to run a
report of what's going on and logging it.
Of course, monitoring adds to the load and can increase the likelihood
of failure.
vmstat & iostat can produce useful reports too, and it might be worth
running one of them and displaying its report directly to a vc.
apache itself isn't likely, but that doesn't rule out any applications
you're running on it.
--
Cheers
John
-- spambait
1aaaaaaa@(protected)
-- Advice
http://webfoot.com/advice/email.top.php
http://www.catb.org/~esr/faqs/smart-questions.html
http://support.microsoft.com/kb/555375
You cannot reply off-list:-)
_______________________________________________
rhelv5-list mailing list
rhelv5-list@(protected)
https://www.redhat.com/mailman/listinfo/rhelv5-list