Email list hosting service & mailing list manager

character strings versus byte strings Matthew Flatt (22 Dec 2003 14:16 UTC)
Re: character strings versus byte strings Per Bothner (22 Dec 2003 17:09 UTC)
Re: character strings versus byte strings Matthew Flatt (22 Dec 2003 17:23 UTC)
Re: character strings versus byte strings tb@xxxxxx (22 Dec 2003 20:23 UTC)
(missing)
(missing)
Re: character strings versus byte strings Tom Lord (22 Dec 2003 22:36 UTC)
Re: character strings versus byte strings tb@xxxxxx (22 Dec 2003 22:41 UTC)
Re: character strings versus byte strings Shiro Kawai (22 Dec 2003 23:00 UTC)
Re: character strings versus byte strings Michael Sperber (23 Dec 2003 09:36 UTC)

Re: character strings versus byte strings Matthew Flatt 22 Dec 2003 17:23 UTC

At Mon, 22 Dec 2003 09:09:44 -0800, Per Bothner wrote:
> Matthew Flatt wrote:
>
> >  * Where "char *" is used for strings (e.g., "expected_explanation" for
> >    a type error), define it to be an ASCII or Latin-1 encoding (I
> >    prefer the latter).
>
> No, it should be UTF-8.

I think you're right.

> So if I was designing a Scheme dialect for internationalization,
> I'd do away with mutable strings.

That sounds right, too.

So, one straightforward apporach is that C code only mutates byte
strings, and string operations in the C API use UTF-8. (I think some
particular encoding has to be chosen, even with the performance
implications.)

Matthew