Re: the discussion so far
John.Cowan
(17 Jul 2005 07:29 UTC)
|
||
(missing)
|
||
(missing)
|
||
(missing)
|
||
(missing)
|
||
(missing)
|
||
(missing)
|
||
(missing)
|
||
Re: the discussion so far
John.Cowan
(20 Jul 2005 05:07 UTC)
|
||
Re: the discussion so far bear (20 Jul 2005 17:27 UTC)
|
||
Re: the discussion so far
John.Cowan
(20 Jul 2005 19:28 UTC)
|
||
Re: the discussion so far
Thomas Bushnell BSG
(20 Jul 2005 19:30 UTC)
|
||
Re: the discussion so far
John.Cowan
(20 Jul 2005 19:41 UTC)
|
||
Re: the discussion so far
bear
(20 Jul 2005 23:56 UTC)
|
||
Re: the discussion so far
Alex Shinn
(21 Jul 2005 01:36 UTC)
|
||
Re: the discussion so far
John.Cowan
(21 Jul 2005 01:47 UTC)
|
||
Re: the discussion so far
bear
(21 Jul 2005 08:52 UTC)
|
On Wed, 20 Jul 2005, John.Cowan wrote: >Thomas Bushnell BSG scripsit: > >> When you provide a function that does almost-the-right-thing, you are >> encouraging programmers to use it. The only case where you have >> identified a value to this function (when implemented as a simple >> radix comparator on codepoints) is when you have binary search trees >> which you want to exchange between scheme systems. > >I picked that as a counterexample to the claim that there were no such >use cases. "There never has been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code." --Lawrence Flon I think you really can't make doing the wrong thing hard. The best you can do is to try to make the right thing as easy as the wrong thing. char-upcase and char-downcase can be used to do useful things which are not wrong. But char-upcase and char-downcase are examples of functions that also make the wrong thing easy to do. I think they can be forgiven that, if they are first well-documented with adequate and proper warnings about their scope and usefulness, and if second there are string-upcase and string-downcase functions that are *NOT* defined to be simply the result of applying char-*case to each codepoint. The thing is that char-upcase and char-downcase, even though restricted to one-to-one character case mappings which do *not* express the full correct casing behavior in Unicode, are still useful. But lest someone just map these incomplete casing functions over the codepoints in order to get (wrong) uppercase or lowercase strings, (that is, the wrong thing they make easy) we must provide string-casing functions that allow the right thing to be just as easy. I would suggest this language for char-upcase and char-downcase; " These functions take a character argument and return a character result. If the argument is an uppercase or titlecase letter, and there is a single letter which is its lowercase form, char-downcase returns that letter. If the argument is a lowercase or titlecase letter, and there is a single letter which is its uppercase form, char-upcase returns that letter. Otherwise, the character returned is the same as the argument. Note that this is an incomplete approximation to case conversion; in general case mappings require the context of a string, both in arguments and in result. See string-upcase and string-downcase for more general case conversion functions. " and this language for string-upcase and string-downcase: " These functions take a string argument argument and return a string as their result. String-upcase converts a string to uppercase, and string-downcase converts a string to lowercase. If an implementation supports locales, the case folding done by these functions will be according to the value of (current-locale). " A similar problem arises with string>? and friends. As defined in the current draft, these functions are *useful.* They can be done quickly and efficiently and without reference to tables and the ordering is consistent and predictable. Unfortunately, they also make the "wrong thing" w/r/t sorting output for human readability easy. The solution, of course, is to take pains to make the "right thing" also easy. So keep string>? etc, as radix sorts on codepoints, but I recommend adding the following functions as well, for people who want to sort output for human readability: string-UCA>? string-UCA>=? string-UCA=? string-UCA<=? string-UCA<? With the more-or-less obvious semantics. String-UCA=? in particular, is valuable since it checks to see if the normalized forms of the strings are equal, without mutating either one and regardless of whether they are represented using different codepoints. Finally, I suggest two additional functions: (set-current-locale! str) Takes a string specifying a locale and attempts to set the global locale accordingly. If it succeeds (if the locale is known to the system and can be used) it returns #t and changes the locale. If it fails (if the locale is unknown, or the implementation does not support changing locales) it returns #f and does not change the locale. Implementations are encouraged but not required to support changing locales. Changing locales, if supported, may change the behavior of string-upcase, string-downcase, string-UCA<? and friends, etc. (current-locale) A thunk which returns a string specifying the current locale.