Naming conventions, mismatch indices, CHAR/CHAR-SET/PRED,...

Naming conventions, mismatch indices, CHAR/CHAR-SET/PRED,... Sergei Egorov 22 Oct 1999 07:09 UTC
This library is the best one I have seen so far
(thanks, Olin!), but I still have some comments
and suggestions.

;;; rationale:

First, I only partially agree with the rationale.
Yes, RNRS's set of string operations is poor, but
there's nothing wrong with naming conventions (the
choice between string= and string=? is mainly a matter
of taste; the former is compatible with = (=? variants
existed in R2RS to make everybody happy), but the
latter is compatible with char=? and other predicates,
and it is both familiar and standard).

Shared substrings are practically unknown to
the Scheme community and, even if they are
implemented somewhere, I see no reason to
make incompatible changes to the semantics
of SUBSTRING (adding SUBSTRING/SHARED is
definitely a better idea, but why put normal
and /SHARED versions in one SRFI?).

I also oppose "dropping" of standard Scheme
procedures. Does this mean that to claim
support for SRFI-13 (or SRFI-1) my implementation
should make them unavailable? If not, then why
mention it at all? I believe the whole idea belongs
to a separate "Scheme Request For Non-implementation"
process (SRFN-0: drop TRANSCRIPT-ON and TRANSCRIPT-OFF).

I believe that <> convention for "not equal" is not
the best choice: it doesn't look right when applied
to domains with no natural order (i.e. MY-RECORD<>).
Other possible choices are ~=, /=, ^=, and != (all
of them are subpar, but if forced to make a choice,
I would pick ~=).

;;; my votes on the Issues:

Add STRING-REDUCE & STRING-REDUCE-RIGHT:  No
STRING-APPEND accepts chars:              No
N-ary comparison functions:               No
Add STRING-TOKENIZE:                      No

;;; CHAR/CHAR-SET/PRED parameters:

In my opinion, this is an example of ad-hoc genericity:
the choice of variants is more or less arbitrary (why
STRING or CHAR-LIST are missing? How can I specify
-ci search for a char?) The whole idea does not fail
only because strings cannot contain char-sets or
procedures (this trick doesn't work with lists or
vectors). I agree that something should be done
to stop the namespace pollution, but there are other
ways: regular higher-order procedures. Besides,
the CHAR/CHAR-SET/PRED approach is another slippery slope:
why don't we just define generic sequence procedures?

;;; Mismatch index in string-comparison procedures:

Why do we need this mismatch index in each and every comparison
procedure? The fact that it can be returned doesn't necessarily
mean that it should; if somebody's procedure ends in (string= s1 s2)
and I have to rewrite it, say, using lists instead of strings,
I don't want to scan all the code looking for call sites to
check if the return value is just used as a boolean (imitating
mismatch index in my new code may be problematic). I will
have all this imaginary trouble because INTENTIONS of the
original author were not clear. This, however, didn't stop
the designers of MEM*, ASS* and dozens of other functions,
but in many cases I can see some advantages in returning
non-#t value. But, judging from my modest experience, I beleive
that in regular lexicographical string comparison the
mismatch index is NEVER needed. It is definitely less
useful in practice than n-ary string comparison predicates;
and the unfortunate fact is that STRING{>...} returning a
mismatch index cannot be generalized to n-ary case!

;;; small stuff:

SUBSTRING-COMPARE{-CI}: is mismatch index (!?) relative or absolute?
STRING-ITER: why not STRING-ITERATE?

{SUB}STRING-{PRE,SUF}FIX-COUNT{-CI}: why not -LENGTH- instead of -COUNT-?
In both CommonLisp and SRFI-1, COUNT is associated with selective counting,
not measuring the length of contiguous subsequences.

SUBSTRING{-CI}? : why do these names end in question mark? They
behave more like SUBSTRING-INDEX{-CI} and other MEMQ-like procedures.

;;; proposed additions (actually these were defined in R^2RS):

(substring-move-left! string1 start1 end1 string2 start2)
(substring-move-right! string1 start1 end1 string2 start2)

String1 and string2 must be a strings, and start1, start2 and
end1 must be exact integers satisfying

 0 <= start1 <= end1 <= (string-length string1)
 0 <= start2 <= end1-start1+start2 <= (string-length string2).

Substring-move-left! and substring-move-right! store characters of
string1 beginning with index start1 (inclusive) and ending with
index end1 (exclusive) into string2 beginning with index start2
(inclusive).

Substring-move-left! stores characters in time order of increasing
indices.  Substring-move-right! stores characters in time order of
decreasing indices.

-- Sergei