Re: Encoding Windows reserved charactes Derick Eddington (24 Sep 2009 17:45 UTC)
Re: Encoding Windows reserved charactes John Cowan (24 Sep 2009 19:22 UTC)

Re: Encoding Windows reserved charactes Derick Eddington 24 Sep 2009 17:45 UTC

John Cowan wrote:

> I think it is sensible to require encoding of all the characters that
> Windows doesn't allow in path components, viz.  #\<, #\>, #\:, #\|,
> #\?, #\*, and #x0; through #x1F;.  Posix allows all but #\x0;, but
> the remainder, though technically permitted, are nothing but nuisances
> in pathname components, as they must be escaped when referred to from
> the shell.  (#\: is an exception, but doesn't show up in Posix filenames
> often either.)
> In addition, the Windows executive treats #\\ and #\/ both as path
> separators, a fact occasionally convenient, although the UI disallows #\/.
> So I'd escape both of them in all cases too.

I have been conflicted about this issue as I was drafting this SRFI.

One part of me wants to say:

I'm disinclined to make SRFI 103 require encoding any characters except
the four it uses specially which must be.  However, as the document
says, an implementation may encode any additional characters it wants.
Always encoding the characters which Windows disallows, or which are
nuisances in shells, may very well be the de facto for the near future.
However, in the farther future, these characters may not need encoding,
and other OSs and shells may have greater prevalence than Windows,
POSIX, and Bash.  Even if the Windows-disallowed and shell-nuisance
characters were required to be encoded, there could still exist
characters which some file systems need encoded but others do not, e.g.
file systems of OSs other than Windows or POSIX, and so communicating
what characters to encode and coordinating transcoding path names would
still be required.

Another part of me thinks:

It's not a big deal.  Not requiring encoding other cultures' languages'
characters, and not requiring encoding other non-natural-language
character-symbols which I want to explore using in library names, and
promoting progress to file systems which can handle all characters, *is*
a big deal to me.  But this small set of Windows-disallowed and
shell-nuisance characters probably won't be common in library names and
can be sacrificed if it really helps portability.  And we can always
make a new SRFI in the future which revises this one to get rid of
requiring encoding these characters.

The question is: what is the exact set of characters which should be
required to be encoded?  I've heard different descriptions of what
Windows/DOS disallows.  Does it differ between versions?  What eras of
Microsoft OSs do we want to cater to?  Surely, some shells differ in
what are nuisance characters?  What shells should be catered to for the
nuisance characters to encode?

: Derick