Re: Encoding Windows reserved charactes Andreas Rottmann 24 Sep 2009 23:17 UTC

John Cowan <xxxxxx@ccil.org> writes:

> Derick Eddington scripsit:
>
>
>> The question is: what is the exact set of characters which should be
>> required to be encoded?  I've heard different descriptions of what
>> Windows/DOS disallows.  Does it differ between versions?  What eras of
>> Microsoft OSs do we want to cater to?
>
> Microsoft's page[1] and Cygwin's[2] agree perfectly; the first certainly
> should know, and the second has had every reason to find out.  I cannot
> believe that Microsoft, with its obsession with backward compat, would
> ever remove a character from the blacklist (which might break ancient apps
> that don't expect to see them) nor add one (which would make existing
> files unreachable).  So I think the blacklist of #\", #\*, #\:, #\<,
> #\>, #\?, #\|, #\/, #\\, and #\x0; to #\x1F; is a solid one.
>
> The blacklist doubtless arose because COMMAND.COM (and its ancestors, the
> CP/M monitor and various DEC command executives) didn't have any escape
> convention, and so files with those characters couldn't be manipulated
> from the shell.  Consequently, the kernel forbade them, and it still does.
>
> Note that this limitation is specific to Windows, the operating system,
> not any particular file system.  In fact, the Microsoft page specifically
> says that there may be more characters which are forbidden by the file
> system.  But I don't think either VFAT or NTFS applies any restrictions
> of its own -- indeed, the Posix subsystem (which bypasses the Windows
> executive and runs directly on the NT kernel) does not respect the
> blacklist, and can create files which Windows programs cannot process.
>
Additionally, and more annoyingly IMO, Windows disallows several
perfectly innocent-looking names like "aux", "prn", "con" and "nul" (at
least), with any extension (see also [0] for a story including some
historical background). I wonder if SRFI 103 should mention this
horrendous stupidity. I actually ran into this, naming a library
"aux.sls", and a fellow Schemer on Windows was unable to check out the
git archive containing this file, getting obscure error messages.

[0] http://heirloom.sourceforge.net/mailx_aux_c.html

>> Surely, some shells differ in what are nuisance characters?  What shells
>> should be catered to for the nuisance characters to encode?
>
> I wouldn't worry about that.  The fact that these characters are
> painful on Posix systems because of the shell is just lagniappe.
>
+1. Zsh handles completion of such filenames just fine, FWIW:

xxxxxx@delenn:~/tmp% touch 'foo*'
xxxxxx@delenn:~/tmp% ls f<TAB>
xxxxxx@delenn:~/tmp% ls foo\*

Regards, Rotty
--
Andreas Rottmann -- <http://rotty.yi.org/>