Re: character strings versus byte strings
Matthew Flatt 22 Dec 2003 17:23 UTC
At Mon, 22 Dec 2003 09:09:44 -0800, Per Bothner wrote:
> Matthew Flatt wrote:
>
> > * Where "char *" is used for strings (e.g., "expected_explanation" for
> > a type error), define it to be an ASCII or Latin-1 encoding (I
> > prefer the latter).
>
> No, it should be UTF-8.
I think you're right.
> So if I was designing a Scheme dialect for internationalization,
> I'd do away with mutable strings.
That sounds right, too.
So, one straightforward apporach is that C code only mutates byte
strings, and string operations in the C API use UTF-8. (I think some
particular encoding has to be chosen, even with the performance
implications.)
Matthew