Re: the discussion so far

Show/hide message thread

Re: the discussion so far John.Cowan (17 Jul 2005 07:29 UTC)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
Re: the discussion so far John.Cowan (20 Jul 2005 05:07 UTC)
Re: the discussion so far bear (20 Jul 2005 17:27 UTC)
Re: the discussion so far John.Cowan (20 Jul 2005 19:28 UTC)
Re: the discussion so far Thomas Bushnell BSG (20 Jul 2005 19:30 UTC)
Re: the discussion so far John.Cowan (20 Jul 2005 19:41 UTC)
Re: the discussion so far bear (20 Jul 2005 23:56 UTC)
Re: the discussion so far Alex Shinn (21 Jul 2005 01:36 UTC)
Re: the discussion so far John.Cowan (21 Jul 2005 01:47 UTC)
Re: the discussion so far bear (21 Jul 2005 08:52 UTC)

Re: the discussion so far bear 20 Jul 2005 17:26 UTC

On Wed, 20 Jul 2005, John.Cowan wrote:

>Thomas Bushnell BSG scripsit:
>
>> When you provide a function that does almost-the-right-thing, you are
>> encouraging programmers to use it.  The only case where you have
>> identified a value to this function (when implemented as a simple
>> radix comparator on codepoints) is when you have binary search trees
>> which you want to exchange between scheme systems.
>
>I picked that as a counterexample to the claim that there were no such
>use cases.

"There never has been, nor will there ever be, any
 programming language in which it is the least bit difficult
 to write bad code."
                     --Lawrence Flon

I think you really can't make doing the wrong thing hard.
The best you can do is to try to make the right thing as
easy as the wrong thing.

char-upcase and char-downcase can be used to do useful
things which are not wrong.  But char-upcase and
char-downcase are examples of functions that also make the
wrong thing easy to do.  I think they can be forgiven that,
if they are first well-documented with adequate and proper
warnings about their scope and usefulness, and if second
there are string-upcase and string-downcase functions that
are *NOT* defined to be simply the result of applying
char-*case to each codepoint.

The thing is that char-upcase and char-downcase, even
though restricted to one-to-one character case mappings
which do *not* express the full correct casing behavior
in Unicode, are still useful.

But lest someone just map these incomplete casing functions
over the codepoints in order to get (wrong) uppercase or
lowercase strings, (that is, the wrong thing they make easy)
we must provide string-casing functions that allow the right
thing to be just as easy.

I would suggest this language for char-upcase and
char-downcase;

" These functions take a character argument and return a
character result.  If the argument is an uppercase or
titlecase letter, and there is a single letter which is its
lowercase form, char-downcase returns that letter.  If the
argument is a lowercase or titlecase letter, and there is a
single letter which is its uppercase form, char-upcase
returns that letter.  Otherwise, the character returned is
the same as the argument.  Note that this is an incomplete
approximation to case conversion; in general case mappings
require the context of a string, both in arguments and in
result.  See string-upcase and string-downcase for more
general case conversion functions.  "

and this language for string-upcase and string-downcase:

" These functions take a string argument argument and return
a string as their result.  String-upcase converts a string
to uppercase, and string-downcase converts a string to
lowercase.  If an implementation supports locales, the case
folding done by these functions will be according to the
value of (current-locale). "

A similar problem arises with string>? and friends.  As
defined in the current draft, these functions are *useful.*
They can be done quickly and efficiently and without
reference to tables and the ordering is consistent and
predictable.  Unfortunately, they also make the "wrong
thing" w/r/t sorting output for human readability easy.  The
solution, of course, is to take pains to make the "right
thing" also easy.

So keep string>? etc, as radix sorts on codepoints, but
I recommend adding  the following functions as well, for
people who want to sort output for human readability:

string-UCA>?
string-UCA>=?
string-UCA=?
string-UCA<=?
string-UCA<?

With the more-or-less obvious semantics.  String-UCA=? in
particular, is valuable since it checks to see if the
normalized forms of the strings are equal, without mutating
either one and regardless of whether they are represented
using different codepoints.

Finally, I suggest two additional functions:

(set-current-locale! str)

Takes a string specifying a locale and attempts to set the
global locale accordingly.  If it succeeds (if the locale is
known to the system and can be used) it returns #t and
changes the locale.  If it fails (if the locale is unknown,
or the implementation does not support changing locales) it
returns #f and does not change the locale.  Implementations
are encouraged but not required to support changing locales.
Changing locales, if supported, may change the behavior of
string-upcase, string-downcase, string-UCA<? and friends,
etc.

(current-locale)

A thunk which returns a string specifying the current
locale.