SRFI withdrawn; comments on the possible future
Matthew Flatt
(25 May 2006 16:37 UTC)
|
Re: SRFI withdrawn; comments on the possible future
Alex Shinn
(26 May 2006 01:22 UTC)
|
Re: SRFI withdrawn; comments on the possible future
bear
(26 May 2006 09:29 UTC)
|
Re: SRFI withdrawn; comments on the possible future Jorgen Schaefer (26 May 2006 15:34 UTC)
|
Re: SRFI withdrawn; comments on the possible future
John Cowan
(26 May 2006 17:37 UTC)
|
Matthew Flatt <xxxxxx@cs.utah.edu> writes: > * `string-normalize-nfd', `string-normalize-nfkd, > `string-normalize-nfc', and `string-normalize-nfkc', which each > accept a string and produce its normalization according to normal > form D, KD, C, or KC, respectively. If the basic concept of the SRFI - a string being a sequence of code points - does not change, I do think these procedures are useful (contrary to bear and Alex Shinn). An implementation can still normalize internally in the "usual case", and if the programmer enforces a different normalization, that's eir problem. STRING=? and similar procedures need to define which kind of normalization they work on (or just "the same normalization for all arguments"). STRING-DOWNCASE, STRING-APPEND etc. need to define whether they may normalize their arguments, and if so, which normalization they return. If the normalization shouldn't be prescribed, another procedure, STRING-NORMALIZE (or similar), needs to be added to return the normalization the implementation prefers. A higher-level string API can (and should) be built on top of the strings defined in this SRFI. > The #\newline character > ----------------------- > > It is likely that #\newline will be removed from Scheme leaving only > #\linefeed. Since R6RS will pin down characters to Unicode scalar > values, the right name for the character is #\linefeed. I'm always in favor of breaking stuff to get a clean result. > Another view is that #\newline can serve as an abstaction of the > end-of-line character sequence which is returned by read-char > when the end-of-line character sequence is read (be it > #\linefeed, or #\return, or # \return followed by #\linefeed). > So even though #\newline and #\linefeed are the same characters, > Scheme programs might use #\newline to highlight that the > character is being used to denote the end-of-line sequence. The > name #\newline would also reinforce the link with the escape > sequence "\n" in strings. If #\newline is considered to be some kind of abstraction of the end-of-line character sequence, please remember that Unicode defines U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR as canonical new line code points, to finally get rid of all these distinctions. > Escape sequences > ---------------- > with semi-colon terminator without terminator > > "A\x42;C" = "ABC" "A\x42\x43" = "ABC" > "\x41;\x42;\x43;" = "ABC" "\x41\x42\x43" = "ABC" > "\x03BB;x.x" = "λx.x" "\x03BBx.x" = "λx.x" I agree with bear that the semicolon is a bad choice - why not use the colon? "\Ax42:C" "\x41:\x42:\x43:" "\x03BB:x.x" > Using less-than and greater-than characters, which are not actual > brackets, avoids this problem: > > #\x<03BB> = #\λ Braces have been offered as an alternative: #\x{03BB} > However, they become somewhat more difficult to read when multiple > escape appear in a string: > > "\x<41>\x<42>\x<43>" = "ABC" "\x{41}\x{42}\x{43}" > In either case, the trade-off is that Scheme strings are unlikely to be > compatible with any other language's string syntax. A consequence is > that there is additional burden on the programmer which must learn yet > another string and character syntax. I do think it's good that we don't go with bad decisions made by other languages just because the decision has been made by them. > Symbol characters > ----------------- > [...] > Meanwhile, the symbol escapes are similar yet not identical to the > escapes in strings and characters, so there is a potential for mistakes > if the programmer is not careful. For example one might expect a\nb to > be a valid symbol, but it is an error. Why not allow the same escapes in symbols and in strings? All in all I like the changes you propose (modulo the comments above). Thanks for the good work! Regards, -- Jorgen -- ((email . "xxxxxx@forcix.cx") (www . "http://www.forcix.cx/") (gpg . "1024D/028AF63C") (irc . "nick forcer on IRCnet"))