> >> Sixth, is there any way for a scheme implementation to support > >> characters and strings in addutional encodings different from > >> unicode and not necessarily subsets of it, and remain compliant? > > > > I don't think so, at least not in the way you envision. I don't think > > that's necessary or even a good idea, either. This SRFI effectively > > hijacks the char and string datatypes and says that the abstractions > > for accessing them deal in Unicode. Any representation that allows > > you to do that---i.e. implement STRING-REF, CHAR->INTEGER, and > > INTEGER->CHAR and so on in a way compatible with the SRFI is fine, > > but I believe you're thinking about representations where that's not > > the case. > > Hmmm. I'm still of the opinion that making the programming > abstraction more closely match the end-user abstraction (ie, > with glyph=character rather than codepoint=character) is just > plain better, in many ways. It gives me the screaming willies > that under Unicode, strings which to the eye look identical, > can have different lengths, no codepoint at any particular > index in common, and sort relative to each other such that > there are an infinite number of unrelated strings that go > between them. To me, it is the codepoint=character model that > is introducing representation artifacts and the glyph=character > model comes a lot closer to avoiding them. > > But we've been there, and I've talked about that, at length. > People seem determined to do it this way, and people with > other languages seem to be doing it mostly this way too. I'm > convinced that requiring the "wrong" approach in a way that > outlaws a better one is a wrong thing, but I'm realistic by > now that nobody else is going to be convinced. > > Also, I'm not entirely happy about banning characters and > character sets that aren't subsets of unicode. In the first > place there are a lot of characters that aren't in Unicode > and are likely never to be - ask a Chinese person to write > his own address without using one and you'll begin to see > the problem. And in the second place, traditionally the > characters have been used to describe a lot of non-character > entities - and while some of these come through in control codes, > others, including the very useful keystroke-description codes > from, eg, MITscheme, simply don't.

You may be right and wrong at the same time. Right because UNICODE is probably
not the last word on working with non-ASCII thingies. Wrong because for now UNICODE
is the only serious effort in this direction that made it as far as a de facto standard.

Apart from that, improving on the simple-minded "char=scalar value, string =
vector of char, upcase etc. are char -> char" might hurt more than it helps.

Sebastian.

Btw, I am happy that the R6RSers decided to do some SRFI rounds with the stuff;
some discussion in public is better than none at all.