Unicode surrogates rgburger@xxxxxx (13 Mar 2006 12:11 UTC)
Re: Unicode surrogates Tom Emerson (13 Mar 2006 13:43 UTC)
Re: Unicode surrogates John Cowan (13 Mar 2006 13:56 UTC)
Re: Unicode surrogates Robby Findler (13 Mar 2006 14:03 UTC)
Re: Unicode surrogates bear (13 Mar 2006 17:04 UTC)
Re: Unicode surrogates Per Bothner (13 Mar 2006 17:13 UTC)
Re: Unicode surrogates bear (22 Mar 2006 23:52 UTC)
Re: Unicode surrogates Tom Emerson (13 Mar 2006 17:16 UTC)

Re: Unicode surrogates Tom Emerson 13 Mar 2006 13:42 UTC

xxxxxx@beckman.com writes:
> The 2005/07/21 draft disallows surrogate code points, namely those between
> #xD800 and #xDFFF inclusive.  In Microsoft Windows NT 4.0 and later, the
> file system and registry use UTF-16LE for encoding names.  They allow bare
> surrogate code points.

The SRFI is describing the *internal* character representation, which
is defined in terms of Unicode Scalar Values. Surrogates are a
side-effect of a particular encoding scheme. It would be the
responsibility of the implementation to generate the appropriate
UTF-16LE encoding for a filename that uses characters outside of the
BMP. This is the same issue for operating systems that use UTF-8 as
the file-system encoding.

> For example, I can create a file called "\uD802.ss" in Windows.  How
> would I be able to open this file in Scheme with the given proposal?

Well, U+D802 is invalid, since it must be paired.

    -tree

--
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
 "You can't fake quality any more than you can fake a good meal." (W.S.B.)