upcoming revision, need feedback
Derick Eddington
(10 Jan 2010 03:34 UTC)
|
Re: upcoming revision, need feedback
Vitaly Magerya
(10 Jan 2010 16:15 UTC)
|
Re: upcoming revision, need feedback
Derick Eddington
(10 Jan 2010 23:48 UTC)
|
Re: upcoming revision, need feedback Vitaly Magerya (11 Jan 2010 02:57 UTC)
|
Re: upcoming revision, need feedback
Derick Eddington
(11 Jan 2010 05:03 UTC)
|
Re: upcoming revision, need feedback
Vitaly Magerya
(11 Jan 2010 13:50 UTC)
|
Re: upcoming revision, need feedback Vitaly Magerya 11 Jan 2010 02:57 UTC
Derick Eddington wrote: > I think the pathname component separators do need to be defined. > [...] if they're undefined, the encoded set would not be clearly, > precisely, completely specified. The current draft sets the encoded set to be <a list of chars and the path separator>. The set of path separators depends on a platform [1], but the set of encoded characters should not (for portability reasons). So you must include all the possible separators from all the supported platforms in the encoded set -- after that specifying each of them separately serves no purpose. But in the end this point is of little importance; I will not object either way. [1] E.g. Windows uses both forward and back slashes as path separators. >>> 7) Add #\; to the set of encoded characters, because a directory could be both >>> in the SCHEME_LIB_PATH sequence and correspond to a library name component. >>> Such a directory with a name including #\; is unusual but must be supported, >>> otherwise an unencoded #\; would be misinterpreted in SCHEME_LIB_PATH. >> >> I heard that when you strive to fail safety it's best to enumerate >> allowed things, not the forbidden ones. > > I don't think that justifies what you suggest below. It is generally hard to list all the failure conditions, but easy to list success conditions. Let me illustrate: ~ is missing in the encoded set, since Windows threats that character specially (e.g. "PROGRA~1" is a shortcut to the first file starting with "Progra"). Another example is Â¥ (U+00A5). When represented in Japanese cp-932 it maps to #x5C (just as \ does in ascii), which is treated as a path separator. Because of this some programs (e.g. Cygwin) will choke on filenames with U+00A5 when cp-932 is your local codepage, even though U+00A5 itself is perfectly legal. This also applies to â© (U+20A9) in Korean (cp-949), and possibly more. >> How about "Encode everything >> except for [a-zA-Z0-9_.-]"? It's safe, short, simple and works for 99% >> of libraries without any encoding at all. > > Other cultures' characters must be usable unencoded, especially since > the targeted file systems support using them, and we want other > cultures' use of Scheme to not be discriminated against growing to be > more than 1% of libraries. FWIW, using non-ascii symbols in source files is widely considered bad manners in my culture. So while I do recognize value in not needing to encode these symbols, I won't complain much about the discrimination. Also note that file system support for localized characters in Windows is (was?) problematic since it uses local codpeage in many places. Due to this a filename with a Ukrainian 'Ñ' (U+0456) is not accessible via an SMB mount from a Windows with Russian settings [2]. [2] Once upon a time this bit a fair share of accountants in Ukraine.