Re: Surrogates and character representation Per Bothner 24 Jul 2005 18:25 UTC
Tom Emerson wrote: > Representing strings internally in UTF-8 is a loss though, since you > lose random access to the string. Random access to a previously accessed position works just fine - just use the byte offset. Random accesses to a position in a string that has not been previously accessed is not in itself useful. > For some applications this isn't a big deal, but in general using UTF-8 > as an internal representation is a bad idea. It's the other way round. Using UTF-8 as in internal representation is just fine for *applications*. The problem is that certain *API*s have a concept of indexing into a string, and unfortunately R5RS is one of them. In itself indexing of strings is a useless feature, as it can be replaced by a sequential-access cursor/iterator API - but historically the Scheme cursor/iterator API uses integers for the "cursor". And existing code moves the "cursor" forwards by adding 1. -- --Per Bothner xxxxxx@bothner.com http://per.bothner.com/