  | |  | problem with special ch aracters ? | problem with special ch aracters ? 2003-12-19 - By Molina, Thomas
Back Does anyone have a good idea on what might break with using LANG=C for RHEL WS 3? We currently have client workstations running RH 8 with LANG=C in /etc/sysconfig/i18n with no problems. The critical application for our developers is Clearcase, which requires this definition. Once Rational develops a patch for WS3 we are going to pound on that pretty hard, but this might be a showstopper if it is a problem. They don 't seem interested in doing the work necessary to make UTF-8 a viable option any time soon.
-- --Original Message-- -- From: John Haxby [mailto:jch@(protected)] Sent: Thursday, December 18, 2003 5:28 PM To: taroon-list@(protected) Subject: Re: problem with special ch aracters ?
Stephen Smoogen wrote:
>For us, we have done the following for our computing clusters. > >LANG= "C " >SUPPORTED= "en_US.UTF-8:en_US:en:C " >SYSFONT= "latarcyrheb-sun16 " > >The speedup was supposedly significant on various word searching items >and in other tools also (this is 3rd person knowledge.. I just know the >cluster prima-donnas found it met their needs for speed).. and we dont >really need anything that God didnt put in ASCII. > > I 'm regularly surprised by how much non-ASCII stuff I see in the UK. I guess that 's why the first letter is an "A " :-)
I tried this:
grep xyzzy <4182 files >
and
LANG=C grep xyzzy <4182 files >
All those files fitted in the buffer cache. My normal locale is en_GB.utf8. The first script took about seven seconds, the second one took about a third of a second -- about a factor of twenty faster.
(Note to self -- stick LANG=C in front of source code grep)
I guess UTF-8 processing has a little way to go for a lot of things. Someone mentioned on the shrike list a while ago the fact that grep is so slow is actually a fixable bug (though I don 't know any details and I don 't think it 's a trivial fix).
However, rather than setting the locale to something non-UTF-8 and breaking things that you don 't expect to get broken, it 's probably better to special case the things that benefit from a non-UTF-8 locale -- like grepping 4000-odd files.
jch
-- Taroon-list mailing list Taroon-list@(protected)
http://www.redhat.com/mailman/listinfo/taroon-list
|
|
 |