the "Unicode Background" section

Show/hide message thread

the "Unicode Background" section Thomas Lord (21 Jul 2005 22:45 UTC)

Re: the "Unicode Background" section Thomas Bushnell BSG (21 Jul 2005 23:10 UTC)

Re: the "Unicode Background" section Matthew Flatt (21 Jul 2005 23:52 UTC)

the "Unicode Background" section Thomas Lord 21 Jul 2005 22:45 UTC

The Unicode Background section of the new draft has

  > It is thus appropriate to define Scheme characters as Unicode scalar
  > values, which includes all code points except those designated as
  > surrogates.

That seems wrong-headed to me.   Characters should simply
be codepoints, instead.

If CHARs are codepoints, more basic Unicode algorithms translate
into Scheme cleanly.

If CHARs are codepoints, they have simple algebraic properties
in relation to integers.

What is gained by forcing surrogates to be unrepresentable as CHAR?

What kind of code will I wind up with if I want to iterate over
a large range of CHAR values?  Must R6RS also add new arithmetic
operators that work on CHAR values or their strangely limited integer
values?  Please don't tell me you want to "protect" me from including
surrogates in my iteration.

It's not as if by excluding surrogates we arrive at a CHAR definition
that is significantly more "linguistic" than if we don't.

-t