Re: Surrogates and character representation Alan Watson 25 Jul 2005 17:23 UTC
> By the same token, random-access disks are a useless feature, for they > can be replaced by sequential-access DECtapes that can be rewound and > selectively rewritten. But at a price. Files actually provide a fairly close analogy to the commonest means of representing Unicode strings. Imagine a file system that implements files as streams of bytes. Now imagine that you want to read the Nth *line*. The only way to do this is to read through the file until you have encounted N-1 newlines. This is like finding the Nth character when using UTF-8 for strings. Now imagine a file system that implements files as enumerated random-access records and uses exactly one record for each line. You can directly read the Nth line. This is like finding the Nth character when using UCS-32 for strings. Now imagine a file system that implements files as enumerated random-access records and uses one or more record for each line. This is like using UTF-16 for strings. Regards, Alan -- Dr Alan Watson Centro de Radioastronomía y Astrofísica Universidad Astronómico Nacional de México