  | | | NFS Help! Terrible performance with sync, fast performance with async | NFS Help! Terrible performance with sync, fast performance with async 2006-11-19 - By Stephen John Smoogen
Back On 11/19/06, Chris Wornell <CWornell@(protected)> wrote: >
> I've got a problem that I've spent quite a bit of time on, though I'm not an > expert at NFS. In summary, operations that require meta-data changes (such > as file/directory creations/deletions), perform extremely slow over sync, > but over 10x faster using async. > > I have two systems, connected to a GigE switch using intel pro 1000 NICs > (jumbo frames is currently not enabled on any of the points). > > The NFS server is a dual-core opteron system with 1GB of RAM and 3x300 SAS > disk RAID-5 (See http://AID-5.ora-code.com) on a Perc5/i controller with 256MB battery backed cache (write > cache is enabled). The file system is ext3. I've configured nfsd to spawn 32 > processes upon startup. I'm using defaults for export the nfs shares, no > changes to rsize or wsize. > > The NFS client is a dual Xeon with 4GB of RAM and a single 7200rpm SATA > disk. Both systems are running RHEL WS 3 Update 8 and kernel > 2.4.21-47.0.1.ELsmp. > > For testing, I'm using bonnie++. The following are some sample test results > that sum up the problem: > > Test on NFS server directly (not NFS loopback) > -Sequential File Creation: 2976 > -Sequential File Deletion: N/A > -Random File Creation: 3077 > -Random File Deletion: 9922 > > NFS test with sync enabled > -Sequential File Creation: 39 > -Sequential File Deletion: 79 > -Random File Creation: 39 > -Random File Delection: 65 > > NFS test with async enabled > -Sequential File Creation: 575 > -Sequential File Deletion: 1718 > -Random File Creation: 543 > -Random File Deletion: 1228 > > Based on the local performance of the NFS server, it does not appear the IO > setup is the culprit. My understanding of the sync operation is a commit > happens which means the NFS server doesn't reply back until the change has > actually been committed to stable storage. There is something happening > behind the scenes though which is causing a huge delay before the NFS server > replies back the commit was complete. > > This question is actually work related and I'm planning to put the NFS > server into production, but I'd rather not use async, even with a UPS and > dual PSU's on the server. With the newer nfs-utils, sync is the default > option as well so it seems like sync should perform relatively well. > > Another question is I don't quite understand how the data corruption > happens if a power loss occurs on an NFS server using async. Even with sync, > data transferred over the wire maybe loss if the nfs server gets shut down > before that data is committed. Can anyone go into more detail on how the > data corruption happens? > > Thanks a bunch! >
A couple of things:
1) Is this RHEL-3 (See http://HEL-3.ora-code.com)? The server and client in RHEL-3 (See http://HEL-3.ora-code.com) default to UDP packets which are the 'worst' for high bandwidth networks. You can find out what the client is mounting things as by looking in /proc/mounts. I forget exactly what needs to be done on the server side (RHEL-4 (See http://HEL-4.ora-code.com) supposedly has a better TCP server but I am not sure). Once you have both server and client on TCP... you should see an improvement.
2) What is your switch set up as? A dumb switch should be ok, but some switches will try to do things like 'grouping' packets etc which can cause bad performance.
3) Look at using iozone as a second test. It may be able to show where the problems are better.
-- Stephen J Smoogen. -- CSIRT/Linux System Administrator How far that little candle throws his beams! So shines a good deed in a naughty world. = Shakespeare. "The Merchant of Venice"
-- Taroon-list mailing list Taroon-list@(protected) https://www.redhat.com/mailman/listinfo/taroon-list
|
|
 |