System hang problem. 2006-10-05 - By Tom Sightler
Back On Wed, 2006-10-04 at 21:30 -0700, Garrick Staples wrote: > On Wed, Oct 04, 2006 at 11:15:09PM -0400, Tom Sightler alleged: > > On Wed, 2006-10-04 at 14:58 -0700, Manish Neema wrote: > > > It is not uncommon for us to see a process size ~40GB on 16 GB machines. > > > I know, the simple answer would be to buy machines with more RAM. > > > > > > Anyway, I'll ask my last question again. Under the given circumstances, > > > is there any way for me to limit the jobs based on SIZE (and NOT RSS; > > > their RAM footprint)? > > > > I know of no way to do this off of the top of my head, however, I fail > > to understand what you think this will do. Isn't lowering the amount of > > swap effectively doing the same thing? > > > > If you limit the size of the process to say, 24GB, then won't the > > application simply fail and error out when it can no longer allocate > > memory? How is this different than the OOM killer kicking in on a > > system with 16GB + 8GB of swap? > > > > Can you explain why one behavior is better than the other? > > Because the OOM killer sucks? Because half the time the OS locks up, > and the other half it kills the wrong process?
Right, but he seemed to imply that it did work in his case. Anyway, I can accept that this is a valid reason. I was actually thinking that you would completely turn the OOM killer off, and set a reasonable swap and you'd likely have a more reliable system.
I'm not particularly familiar with the option, but it sounds like the address size limit in /etc/security/limits.conf would do what he wants. I think this is the equivalent of ulimit -v. I think it sets the following:
RLIMIT_AS The maximum size of the process's virtual memory (address space) in bytes. This limit affects calls to brk(2) , mmap(2) and mremap(2) , which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that kills the process when no alternate stack has been made available). Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited.
Admittedly this is per-process, which may make it less valuable.
Later, Tom
-- Taroon-list mailing list Taroon-list@(protected) https://www.redhat.com/mailman/listinfo/taroon-list
|
|