Email list hosting service & mailing list manager

the "Unicode Background" section Thomas Lord (21 Jul 2005 22:45 UTC)
Re: the "Unicode Background" section Thomas Bushnell BSG (21 Jul 2005 23:10 UTC)
Re: the "Unicode Background" section Matthew Flatt (21 Jul 2005 23:52 UTC)

Re: the "Unicode Background" section Matthew Flatt 21 Jul 2005 23:52 UTC

At Thu, 21 Jul 2005 15:45:34 -0700, Thomas Lord wrote:
> If CHARs are codepoints, more basic Unicode algorithms translate
> into Scheme cleanly.

I don't see what you mean. Can you provide an example?

> What is gained by forcing surrogates to be unrepresentable as CHAR?

Every string is representable in UTF-8, UTF-16, etc.

> What kind of code will I wind up with if I want to iterate over
> a large range of CHAR values?

Two loops: one from 0 to #xD7FF, and one from #xE000 to #x10FFFF.

> It's not as if by excluding surrogates we arrive at a CHAR definition
> that is significantly more "linguistic" than if we don't.

True, but we arrive at a definition that is more standards-friendly,
and that's part of the overall compromise.

FWIW: MzScheme originally supported a larger set of characters, mainly
because extra bits are available my implementation. The resulting bad
experience convinced me to define characters in terms of scalar values,
instead.

Matthew