bear <xxxxxx@sonic.net> writes:
> Particularly, some characters, particularly accented characters,
> have uppercase and lowercase versions which are different numbers of
> codepoints. Thus, in the "codepoint equals character" model, one
> case is a character and the other case -- isn't.
I don't quite understand what you're saying: the locale-independent
case mappings in UnicodeData.txt always map a single scalar value to a
single scalar value. Sure it doesn't always do what your locale
thinks (as you point out), but this case mapping doesn't require
"multi-codepoint characters."
> Sixth, is there any way for a scheme implementation to support
> characters and strings in addutional encodings different from
> unicode and not necessarily subsets of it, and remain compliant?
I don't think so, at least not in the way you envision. I don't think
that's necessary or even a good idea, either. This SRFI effectively
hijacks the char and string datatypes and says that the abstractions
for accessing them deal in Unicode. Any representation that allows
you to do that---i.e. implement STRING-REF, CHAR->INTEGER, and
INTEGER->CHAR and so on in a way compatible with the SRFI is fine,
but I believe you're thinking about representations where that's not
the case.
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla