Re: Surrogates and character representation Ken Dickey 23 Jul 2005 15:05 UTC
On Saturday 23 July 2005 00:19, Thomas Bushnell BSG wrote: > Tom Emerson <xxxxxx@basistech.com> writes: > > Surrogate codepoints have a character property. They should be usable > > in a string, and individually can be considered a character. > > This is exactly part of the reason why char=codepoint is such a lose. > Most code doesn't *want* to see this kind of garbage; it's an encoding > issue. I want chars where the *computer* takes care of the coding. I > want chars that are fully-understood characters, not little pieces of > a character. This points out a tension underlying this thread. There are two dicsussions intertwined here. [1] The access to and use of Unicode within Scheme (e.g. to process internationalized web pages) and [2] bringing Unicode into Scheme (extending Symbol & String datatypes). SRFI-75 specifically addresses the second of these goals and (wisely) states that the first goal is left to another SRFI. I for one would be satisfied to be able to portably manipulate Unicode using Scheme source encoded in ASCII (or UTF-8). In particular, I would be willing use have a separate datatype (or datatypes) and libraries to accomplish this. Would anyone care to post a Unicode Encoding & I/O SRFI, so that the *other* discussion can be moved from this thread to that one? $0.02, -KenD