SRFI-13: late laundry list erik hilsdale (17 Nov 1999 07:28 UTC)
Re: SRFI-13: late laundry list d96-mst@xxxxxx (19 Nov 1999 12:00 UTC)

SRFI-13: late laundry list erik hilsdale 17 Nov 1999 07:28 UTC

Hi.  Sorry I'm late to the party.

I've organized this message somewhat strangely.  Rather than going
through each individual procedure, I've tried to figure out the
overarching design goals of the library and treat each in turn.  I've
probably missed a few.  After doing this for awhile I'll get to some
function-specific weirdnesses I noticed.

I'm harping on consistency here because I'm one of those programmers
Olin mentions who can't be bothered to remember all the procedures in
a library, but rather reconstitute the name (and calling convention)
when they need it.  So I'll favor a _consistent_ interface over a
_concice_ interface (though I think the only times I found these to be
in opposition is in string-<index/skip>-right, see below).

---- Naming

The procedure-names in the library obey a very nice naming convention:
[sub]string-<verb/noun>[-ci][!] for the most part.  This is why I was
surprised to see some breaking of these rules.  For consistency, I
would strongly support renaming the following procedure:

	join-strings       ==> string-join

I would also support renaming these two

	capitalize-string  ==> string-capitalize
	capitalize-string! ==> string-capitalize

except that would cause a strange discontinuity with
capitalize-words[!].  So I probably wouldn't rename these two unless I
could convince everybody to drop capitalize-words[!] entirely, which
I'll try to do below when I get to individual procedures.

---- Optional [start end] arguments for single-string procedures

The rule that is almost always followed is that if there is a _single_
string argument to a function, there will be optional start/end
arguments for that string.  There are four exceptions to this rule:

	string?
	string-length
	string-null?
	string-<take/drop>[-right]

Adding optional arguments to the first three is clearly goofy.
However, although adding the optionals to string-<take/drop>[-right]
seems goofy, I'll ask for it anyway.

Because my Scheme doesn't have shared substrings, when I start passing
around start/end pairs to my functions, I do so for efficiency, and I
really do treat the three arguments 's start end' as one:

  (define foo
    (lambda (s start end) ;; one conceptual argument
      ...stuff...))

I don't like to think when I do this, so I'll almost certainly try to do:

    (string-take s start end 10)

As soon as I get the error message, I'll kick myself and rewrite it as
one of the obvious

    (substring s start (+ start 10))
    (values s start (+ start 10))
    (values s start (min (+ start 10) end))

But I don't want to think.  I want string-take to accept my conceptual
shared string that happens to be split up into three arguments.

---- Mandatory 'start2 end2' arguments for multi-string procedures

All procedures that take more than one string, if they accept
start/end pairs for those strings, _require_ all start/end pairs.
That is, none of them is optional.

Except for string-replace, where [start2 end2] is optional. .  This is
a bit strange.  Certainly, substring-compare[-ci] can't have optional
[start2 end2] arguments, since the continuation arguments come last.
But what would be wrong with

    (substring= mystr mystr-start mystr-end "foo")

So it seems to me that the domains of

   substring[-ci]=
   substring[-ci]<>
   substring[-ci]<
   substring[-ci]>
   substring[-ci]<=
   substring[-ci]>=
   substring-<prefix/suffix>-length[-ci]
   substring-<prefix/suffix>[-ci]?

should be changed from

   s1 start1 end1 s2 start2 end2

to

   s1 start1 end1 s2 [start2 end2]

---- Optional [end start] arguments for index and skip

Because the optional [start end] convention is so firmly established
in the library, it hurts me dearly to see [end start] in the
index-right and skip-right procedures.  Olin's defense that the start
argument is almost never specified for these doesn't sway me, and the
fact that I'll get a runtime error when I try it just irritates me.

When I'm using the 's start end' convention to represent my strings,
I'm not going to remember that these two arguments need to be reversed
for these two procedures, and if I do I'll curse the fact that I've
devoted brain cells to the exception.

When I'm not using this convention and just using the optional
arguments 'casually', I at least would be happy to always write a zero
for my start point whenever I want to specify my end point.

---- Overflows

Questions about some overflow cases:

(string-take "foo" 5) ==> "foo" or error?
(string-drop "foo" 5) ==> "" or error?

(string-copy! "xx" 0 "yyy") ==> I assume an error, but am worried by
			        the language
				'the copy is guaranteed to work'.

---- Sharing

I'll cast my vote for the liberal camp.  However, I feel obligated to
get a little wacky about it and ask why we're providing

	string-append/shared
	string-concatenate/shared
	reverse-string-concatenate/shared

If we're breaking R5RS's substring, I see no reason not to also break
R5RS's string-append.  That is, by voting 'liberal' I believe that

	(eq? foo (string-append "" foo))
	(eq? foo (string-concatenate (list "" foo)))
	(eq? foo (reverse-string-concatenate (list "" foo)))

should all be allowed to be true.  The problem with my firm stance is
that I have no idea what the 'non-shared' versions should be named.
string-append/copy?  string-append/fresh?

---- string-for-each, string-iterate

I'm definitely with Lars Arvestad on this one.  String-for-each should
require a left-to-right ordering.  I mainly argue this because I've
now sent approximately five years-worth of undergraduates out into the
world with the factoid that they should use 'for-each' when they care
about evaluation order.  I'm willing to pay the price of an extra
register-register comparison for that clarity.

Well, I think I'm willing to do that, anyway.  I'd be happier if there
were a good, non-'for-each' identifier that would connote
unorderdness.  Sigh.

---- string-null?

I worry (probably needlessly) about people confusing the 'null' lexeme
with the character with ascii value 0 that C programmers use to
terminate their strings.

Or even worse, someone confusing it with the terminal value of a
recursive datatype (grin).

I was going to argue for 'empty-string?', but that would conflict with
the 'string-first' naming convention.  How about 'string-empty?'?

---- capitalize-string[!], capitalize-words[!]

I just don't see a reason for having these procedures in this library,
_especially_ capitalize-words.  They seem much more suited for a
'natural language' sublibrary or the like.  I'm not sure I can argue
cogently why they don't belong, but they stick out a bit to me, where
string-upcase and string-downcase don't.  Capitalize-string[!] I can
see as somewhat useful, but capitalize-words[!], for me, opens the
same kind of bag of worms that string-split does.

---- join-strings

Why no 'prefix grammar?

---- string-parse-start+end

I didn't see anywhere in this spec where this procedure could be used.
In every case where there is an optional [start end] argument pair,
it's the last arguments of the function, and therefore
sring-parse-final-start+end seems to be the only one of this pair that
would be used.

Not that I particularly object to it being in the string-lib-internals
module, I'm just not sure why it's there.

Also, you use the id 'rest' in the description of these two functions
when you should probably use the id 'ARGS'.

---- Votes

So I don't feel left out, here are my votes for the open (and closed)
issues.

STRING-APPEND accepts chars as well as strings?		-- no
Comparison functions n-ary?				-- no
Include STRING-TOKENIZE?				-- can't decide
Include STRING-REDUCE and STRING-REDUCE-RIGHT ?		-- no
SUBSTRING and copying/shared-text semantics:		-- liberal
STRING-ITER vs STRING-ITERATE				-- iterate
-COUNT versus -LENGTH					-- length

--------------------------------------------------

Thanks for taking the time to design this library, Olin.  The fact
that I'm picking on little inconsistencies should reflect that the
design shows such great consistency and elegance.

-erik