Naming conventions, mismatch indices, CHAR/CHAR-SET/PRED,... Sergei Egorov 22 Oct 1999 07:09 UTC
This library is the best one I have seen so far (thanks, Olin!), but I still have some comments and suggestions. ;;; rationale: First, I only partially agree with the rationale. Yes, RNRS's set of string operations is poor, but there's nothing wrong with naming conventions (the choice between string= and string=? is mainly a matter of taste; the former is compatible with = (=? variants existed in R2RS to make everybody happy), but the latter is compatible with char=? and other predicates, and it is both familiar and standard). Shared substrings are practically unknown to the Scheme community and, even if they are implemented somewhere, I see no reason to make incompatible changes to the semantics of SUBSTRING (adding SUBSTRING/SHARED is definitely a better idea, but why put normal and /SHARED versions in one SRFI?). I also oppose "dropping" of standard Scheme procedures. Does this mean that to claim support for SRFI-13 (or SRFI-1) my implementation should make them unavailable? If not, then why mention it at all? I believe the whole idea belongs to a separate "Scheme Request For Non-implementation" process (SRFN-0: drop TRANSCRIPT-ON and TRANSCRIPT-OFF). I believe that <> convention for "not equal" is not the best choice: it doesn't look right when applied to domains with no natural order (i.e. MY-RECORD<>). Other possible choices are ~=, /=, ^=, and != (all of them are subpar, but if forced to make a choice, I would pick ~=). ;;; my votes on the Issues: Add STRING-REDUCE & STRING-REDUCE-RIGHT: No STRING-APPEND accepts chars: No N-ary comparison functions: No Add STRING-TOKENIZE: No ;;; CHAR/CHAR-SET/PRED parameters: In my opinion, this is an example of ad-hoc genericity: the choice of variants is more or less arbitrary (why STRING or CHAR-LIST are missing? How can I specify -ci search for a char?) The whole idea does not fail only because strings cannot contain char-sets or procedures (this trick doesn't work with lists or vectors). I agree that something should be done to stop the namespace pollution, but there are other ways: regular higher-order procedures. Besides, the CHAR/CHAR-SET/PRED approach is another slippery slope: why don't we just define generic sequence procedures? ;;; Mismatch index in string-comparison procedures: Why do we need this mismatch index in each and every comparison procedure? The fact that it can be returned doesn't necessarily mean that it should; if somebody's procedure ends in (string= s1 s2) and I have to rewrite it, say, using lists instead of strings, I don't want to scan all the code looking for call sites to check if the return value is just used as a boolean (imitating mismatch index in my new code may be problematic). I will have all this imaginary trouble because INTENTIONS of the original author were not clear. This, however, didn't stop the designers of MEM*, ASS* and dozens of other functions, but in many cases I can see some advantages in returning non-#t value. But, judging from my modest experience, I beleive that in regular lexicographical string comparison the mismatch index is NEVER needed. It is definitely less useful in practice than n-ary string comparison predicates; and the unfortunate fact is that STRING{>...} returning a mismatch index cannot be generalized to n-ary case! ;;; small stuff: SUBSTRING-COMPARE{-CI}: is mismatch index (!?) relative or absolute? STRING-ITER: why not STRING-ITERATE? {SUB}STRING-{PRE,SUF}FIX-COUNT{-CI}: why not -LENGTH- instead of -COUNT-? In both CommonLisp and SRFI-1, COUNT is associated with selective counting, not measuring the length of contiguous subsequences. SUBSTRING{-CI}? : why do these names end in question mark? They behave more like SUBSTRING-INDEX{-CI} and other MEMQ-like procedures. ;;; proposed additions (actually these were defined in R^2RS): (substring-move-left! string1 start1 end1 string2 start2) (substring-move-right! string1 start1 end1 string2 start2) String1 and string2 must be a strings, and start1, start2 and end1 must be exact integers satisfying 0 <= start1 <= end1 <= (string-length string1) 0 <= start2 <= end1-start1+start2 <= (string-length string2). Substring-move-left! and substring-move-right! store characters of string1 beginning with index start1 (inclusive) and ending with index end1 (exclusive) into string2 beginning with index start2 (inclusive). Substring-move-left! stores characters in time order of increasing indices. Substring-move-right! stores characters in time order of decreasing indices. -- Sergei