  | |  | spontaneous reboots on an RHEL AS 3.0 | spontaneous reboots on an RHEL AS 3.0 2005-03-09 - By Antonio Dolcetta
Back Hi
I have 2 RHEL 3.0 AS servers installed with RH cluster manager in an active-passive configuration. the cluster is running apache+mysql+jboss
I have some strange one of the nodes has some strange problems.
sometimes the server spontaneously reboots nothing gets logged in /var/log/messages operation seems normal up to the time of the reboot
i have setup vmstat to dump out some usage statistics once per minute on a custom logfile. the last lines before the latest reboot were:
(lines may wrap) Fri Mar 4 10:57:41 2005 procs memory swap io system cpu Fri Mar 4 10:57:41 2005 r b swpd free buff cache si so bi bo in cs us sy wa id Fri Mar 4 10:57:41 2005 6 0 1837152 21308 174336 560292 8 9 9 86 732 4930 17 37 2 44 Fri Mar 4 10:58:41 2005 4 0 1837048 23176 174424 560584 5 0 8 69 247 4272 13 36 1 50 Fri Mar 4 10:59:41 2005 4 0 1837048 22552 174548 560836 1 0 2 66 267 4722 13 36 1 50 Fri Mar 4 11:00:41 2005 2 0 1837048 22404 174616 561028 1 0 1 61 239 4373 12 36 1 51 Fri Mar 4 11:01:41 2005 5 0 1837048 19500 174688 563940 1 0 2 65 291 4477 11 35 2 52 Fri Mar 4 11:02:41 2005 6 0 1837188 21972 174804 560088 11 31 12 96 308 4512 12 36 2 50 Fri Mar 4 11:03:41 2005 3 0 1837224 22100 174888 559776 4 8 4 75 264 4253 12 36 2 51 Fri Mar 4 11:04:41 2005 3 0 1837296 21760 174968 560016 1 3 3 74 259 4326 12 35 1 52 Fri Mar 4 11:05:41 2005 5 0 1837452 21912 175036 560028 2 4 2 74 292 4405 12 36 1 51 Fri Mar 4 11:06:41 2005 4 0 1837876 22216 175172 559180 5 10 5 84 296 4480 12 36 1 50 Fri Mar 4 11:07:41 2005 3 0 1838088 22196 175268 559208 2 6 2 72 1321 5485 15 38 1 45 Fri Mar 4 11:08:41 2005 8 0 1844344 22988 175344 520620 663 300 663 382 889 4743 22 43 5 30 Fri Mar 4 11:09:41 2005 4 0 1844704 22360 175480 520492 3 30 4 103 909 5107 18 38 1 43 Fri Mar 4 11:10:41 2005 5 0 1844808 23064 175640 519524 3 25 3 106 293 4527 13 36 1 50
yes, the server is under some load, but nothing exceptional. The other node under the same conditions does not reboot.
this is the output of free: total used free shared buffers cached Mem: 2061676 2039528 22148 0 191508 382560 -/+ buffers/cache: 1465460 596216 Swap: 6291400 1992448 4298952
this is the content of /proc/swaps Filename Type Size Used Priority /dev/cciss/c0d0p2 partition 2097112 889516 0 /dev/vol01/lvswap (deleted) partition 2097144 888988 0 /dev/vol01/lvswap2 partition 2097144 213464 0
I have recently added the third swap partition, that's why its used less than the ohers, I'm not sure what the "(deleted)" part means.
another strange problem this host has is that ps always shows all threads even if I don't specify the -m flag e.g. [root@(protected) root]# ps -ef -m | wc 838 9078 97432 [root@(protected) root]# ps -ef | wc 838 9077 97429 [root@(protected) root]#
when I run the same commands on the other node the output is radically different without the -m (I get around 140 lines)
What can I do to diagnose the problem further ?
any help appreceated.
Thank you
Antonio
--
Antonio http://gelo.dolcetta.net
-- Taroon-list mailing list Taroon-list@(protected) http://www.redhat.com/mailman/listinfo/taroon-list
Earn $52 per hosting referral at Lunarpages.
|
|
 |