Re: character strings versus byte strings
bear 23 Dec 2003 01:47 UTC
For what it's worth, I think that there is a worthwhile
concept in a "unicode corresponding character subset."
That is, within each implementation, the set of characters
which can be represented by single corresponding codepoints
of unicode. This won't be the same set in every scheme.
You can define your FFI's semantics on what happens when
strings are composed entirely of the unicode-correspondent
set, leave un(der)specified what happens when the c side
returns a unicode character unknwon to the scheme system or
when the scheme system uses a character unknown to unicode,
and let the implementors worry about their implementations'
extensions.
There should probably be predicates to ask whether
particular entities (characters or codepoint-numbers) are
part of the unicode corresponding character subset for the
given scheme.
Bear