I know that I'm late coming into this discussion, but I just had to get a few comments in on the status of SRFi-13. First and foremost, I pretty much like it. However... Reading over it, I get a sense that there is a case of rampant featuritis happening here. String utilities are *so* pervasive that everyone has opinions and preferred interfaces. IMNSHO, the SRFI could lose: - (Don't shoot me for this) all of the sub-stringiness. See below for a justification. - char-set params; wrap 'em up in your own lambda, please. Is it really more efficient to do this in the library, when R5RS doesn't support the data type anyway? - functions of marginal utility, I know that this is a matter of diverse opinion, but I suspect that there are more than a few functions in there that only *rarely* get used by anyone. My candidates are included below. Re: losing substringiness I agree that this is desirable functionality, I just really think it needs to be spec'ed elsewhere (SRFI-N). Shared-substrings are a separate data type in other languages for good reasons (I can't speak to Guile's issues, I use Guile, but I didn't even know they were there, the documentation *is* being rewritten), the simplest being that they have different lifecycle issues from 'full' strings. Besides, there are bunches of interfaces in SRFI-13 which would be a lot cleaner without [start end] and the indirect binding of substring to params e.g. (f s1 s2 start1 end1 start2 end2). Of course, because Olin is very clever, this will be mostly invisible to the casual user of the library. I just feel very strongly that this can be better treated in a shared-substring SRFI, which will probably never get written if most of the functionality is already avaiable in this one. Re: functions of marginal utility In thinking about this list, I would have to say that it is prejudiced by virtue of the fact that I use strings as grey boxes for human-directed information. If you're using strings as a poor man's byte-vector then you may indeed want some of these rather more than I do. I would again ask the question: Does this functionality fit better as part of a different package? string-map - In 15 years of programming, I can't think of *once* where I have used something this general. string-[up/down]case, yes, generalized map, no. Not a big deal, but exemplary. string-fold/unfold - these are actually cool functions. One constructs a data structure from a string, and the other builds a string from a data structure. I just can't get over the nagging feeling that these are more about string parsing than about string manipulation. And Olin suggested a parsing SRFI... string-tabluate - ??? I grok it, just can't see the utility string-for/do-each - No great complaint, I just think that since you're almost certainly doing stateful processing inside the loop that your code will probably be more readable with a coded loop than with a function call. OTOH, this does present nice possibilities when coupled with call/cc. string-compare - how much does this add to string-{pre/suf}fix? I find it to be fairly cool (handling ordering relationships is one of my pet programming peeves), I just don't think I'd ever end up using it. string-capitalize - I agree with Olin. Ditch it. It has too much too do with natural language rules, and not enough to do with string manipulation. string-{filter/delete} - I feel much the same way about this as I do about string-map. Re: string-tokenize/string-split > From: xxxxxx@pobox.com > > Suppose we have a string 'str' consisting of tokens separated > by a #\: character. We can extract the tokens using either > > (string-tokenize str > (char-set-difference char-set:full (char-set #\:))) > or > (string-split str (char-set #\:)) > > the two procedure calls above are indeed roughly equivalent; > therefore, a String library should define only one of them. Well, that does not appear to be a guiding principle from my read of the SRFI document. > It indeed appears that some problems lend themselves to > delimiter-based parsing while the others do to the inclusion > semantics. This is the telling argument. As you may have guessed I am in favor of this proposal. In fact, I would go sa far as to say that the delimited approach should be retained and the inclusion approach abandoned; inclusion is both better & more readily addressed via regexp's. The inclusion case frequently involves far more specific subsets than the standard available char-set:*s, and using them will require complex composition of char-sets involving more overhead. OTOH, regexp's are generally less 'dynamic' as in you generally don't recompute them on the fly, which you could more easily do with string-tokenize's char-sets. > Unicode is important! But... one possible reply is that Gambit's > char-set implementation needs to be improved. This seems to be the tail wagging the dog. > You're *still* right in your larger point that I could split at colons more > easily with a string-split. But even after reading your comments, when I sit > down to try and design a procedure or two that does the basics, I still go > helplessly sliding down a slippery feature slope. I believe it. I think that this might simplify by eliminating the sub-string features of this SRFI (which I advocate for other reasons, anyway). > You *have* to allow control of the delimiter grammar -- separator, > terminator, prefix No, you don't. This is not necessarily an inverse of string-join, although it would be nice. I'd estimate that 99% of the cases involve infix splitting. Terminator splitting is a trivial special case. of infix. I'm not sure that prefix splitting would account for even 0.1% of all the cases. > START/END indices? Ditch 'em. > If we are going to quit early (via MAXSPLIT), we need a way to tell > the client how far into the string we got. No. It's already there in Oleg's proposal bacause the last returned string in a MAXSPLIT case is all the remaining text. > On the other hand, I'm not happy with returning the rest of the > string as a final element of the return list. Why? > One of these things is not like the other... What thing is not like which other? > Not to mention that it requires copying the data. Only if you already had to copy it in the first place. Well, OK, here's my best shot (which is not too good) (plagiarized from Olin and twisted to my other prejudices as outlined above): (string-split s char/predicate [grammar max-tokens]) -> string-list - GRAMMAR is 'infix, 'suffix, 'prefix, or 'strict-infix Defaults to 'infix. - MAX-TOKENS is an integer saying "quit after this many tokens"; #F means infinity. Defaults to #f. Oleg's convention (last element of list has remaining text) applies > - ELIDE-DELIMS is boolean, meaning runs of delimiters count as > single delimiter. Defaults to #t. Overkill, and it violates inverse-ness for string-join. The only case we want it for (in general) is handling white space. I could take it or leave it, but it should be the last parameter (most specific case). Now that I'm thinking, I'd probably take it in favor of string-split-ws... > This is powerful. It's good to have an inverse for STRING-JOIN. Yep. > It's a heck of a lot of parameters. Does anyone besides Oleg want to > push for it? Count me in. david rush -- A camel is a horse designed by committee...