System hang problem. 2006-10-03 - By Manish Neema
Back Hi Tom,
Thanks for the reply.
We develop EDA S/W and most of our tools are pretty memory hungry. Most of our systems are 2CPU, 16GB RAM, 32GB SWAP. Small percent of machines have 32GB, 64GB and 128GB RAM. Our queuing system would dispatch jobs equal to the # of CPUs on a machine so there are times when more than one job will turn out to be memory intensive, causing the machine to crawl/hang. Also, it is an R&D environment so code running on the machine may have memory leak problem at times.
Anyway, since the final memory requirement is not exactly known to the users before submitting their jobs, we often see machines hanging. I understand that OOM kill is bad (heuristics can cause any random process to die) but believe it or not, it used to work perfectly fine for us in RHEL3.0 U3 (and we are actually expecting RHEL3.0 U5 and U7 to exhibit similar OOM kills), since none of the memory overcommit settings seems to be helping effectively.
Is there any /proc knobs that can help limit process SIZE? I know "limits.conf" allows controlling 'RSS' but we need a control for total SIZE.
I would appreciate any further suggestions...
Thanks! -Manish
-- --Original Message-- -- From: taroon-list-bounces@(protected) [mailto:taroon-list-bounces@(protected)] On Behalf Of Tom Sightler Sent: Tuesday, October 03, 2006 7:29 PM To: Discussion of Red Hat Enterprise Linux 3 (Taroon) Subject: Re: System hang problem.
On Tue, 2006-10-03 at 15:23 -0700, Manish Neema wrote:
> Any suggestions on how we can prevent system-hang + not have automount > (and any other root process) die?
Perhaps this is a silly suggestion, but why wouldn't you just add more memory/swap to keep the system from needing to invoke the OOM killer? The system will not OOM kill a process until it's completely out of all pages in a given zone. It sounds like you don't have enough memory, if you were relying on the OOM killer to keep your system running on U3 then you still had a problem, a normally running system should not trigger the OOM killer.
If the memory usage is load driven (for example a dynamic web server that sees large bursts of traffic) then you need to control the memory allocation of the system by using the throttling features built into these systems to limit concurrent connections to a reasonably serviceable amount.
Is the system really running out of memory, or is it zone starvation? I've seen cases on large memory systems (systems with 8GB+ of RAM), where the OOM killer kicks in when low memory is starved even if large amounts of memory are still available.
I would suggest describing a little more about your system (hardware, RAM, swap) and application environment if you want more constructive suggestions.
Later, Tom
-- Taroon-list mailing list Taroon-list@(protected) https://www.redhat.com/mailman/listinfo/taroon-list
-- Taroon-list mailing list Taroon-list@(protected) https://www.redhat.com/mailman/listinfo/taroon-list
|
|