Re: Surrogates and character representation
Alan Watson 24 Jul 2005 18:14 UTC
Okay, thanks for clearing up my misunderstanding.
> but in general using UTF-8 as an internal representation is
> a bad idea.
Using UTF-8 internally for a Scheme on a Plan 9 system is not obviously
a bad idea. Sure, you don't have direct indexing, but you avoid
conversion when you talk to the C library and OS.
Using UTF-16 internally doesn't give you direct indexing because of
characters outside the BMP, but it might make sense on Windows boxes for
precisely the same reason.
Using UCS-32 internally in these cases would involve translation to talk
to the library and OS and would further make my emacs use about four
times as much memory as it does now (which brings us back the the
representation for infinity).
In general, any single representation is a bad idea in some circumstances.
Regards,
Alan
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México