extending the discussion Tom Lord (15 Dec 1999 19:57 UTC)
|
Re: extending the discussion
d96-mst@xxxxxx
(16 Dec 1999 21:05 UTC)
|
extending the discussion Tom Lord 15 Dec 1999 19:57 UTC
I propose that the discussion period for SRFI-13 be extended and that the additional time be used to consider some substantial changes. The primary issue is the treatment of shared substrings and string indexes. * The convention of permitting string interfaces everywhere makes it more difficult to implement extensions of the proposed library, and complicates the interfaces of all conforming extensions. This is because extensions ought to also obey the same convention. * Permitting string indexes everywhere precludes conformance to another, more established convention: that comparison functions should permit more than two arguments. * Permitting string indexes everywhere encourages programmers seeking to write portable code to use string indexes when shared substrings would be simpler and less error prone. This makes it awkward to incorporate the code such programmers write into systems which support shared substrings. Moreover, manipulating string indexes is notoriously error prone and so should not be a prominent feature of portable Scheme style. * A reasonable alternative is to have two SRFI's: One, a string library without ubiquitous index parameters and without "substring/shared", the other, a SRFI for "substring/shared" in which that procedure is guaranteed to return a value which shares state with its primary string argument. Those who use only the first SRFI are thereby discouraged from trying to use Scheme for hyper-efficient string processing. Those who want to use Scheme for hyper-efficient string processing are thereby encouraged to choose an implementation or environment which supports shared substrings. Those who want to write maximally portable code are discouraged from counting on hyper-efficient string processing. All of the standardized interfaces are kept simple and clean. Existing conventions (such as N-ary comparison operators) are preserved. Future SRFI's, which define additional string manipulation functions for use in applications which expect hyper-efficient string processing should assume that "substring/shared" is present. SRFI's which define additional string manipulation procedures just for convenience, where efficiency is not a concern, can avoid shared substrings. * In a SRFI which defines "substring/shared", it should be mandatory that the string returned from that procedure share state with the primary string argument. Two additional procedures are desirable: shared-substring? obj => boolean which tells whether a particular string is a shared substring, and containing-string string => string start end which converts a shared substring to its parent string and indexes, and an ordinary string to itself, 0, and its length. It is important that "substring/shared" return a truly shared substring so that side effects on its result are reflected in its argument. That propagation of side effects is an essential part of using shared substrings to write code which manipulates strings efficiently. A secondary issue is the conventions of accepting a parameter which may be a character, character set, or predicate. * The CHAR/CSET/PREDICATE convention complicates the implementation of every procedure which uses it. Future extensions to the library are similarly complicated. * The addition of a single procedure to the character set library could simplify the convention: (char-set-membership cset) => predicate where (predicate c) => #t ; if c is in cset #f ; otherwise With `char-set-membership', the convention should be simplified to CHAR/PREDICATE. * I would not suggest further simplifying the convention by defining: (char-equality character) => predicate since passing a character for one of these parameters is presumably, by far, the common case. Also, passing a character for one of these parameters is supported by the traditional standards (R^nRS and every implementation). * There is then a choice between two conventions represented by this example: string-index char/predicate string or string-index char string string-predicate-index predicate string I prefer the latter, though not for any particularly strong reason. Another secondary issue is whether symbols should be acceptable as arguments to procedures that expect strings but that do not modify those strings. The print name of the symbol would be used as the string value of a symbol. I have found this convention natural, easy to implement, and useful. A final secondary issue is whether procedures that construct strings from individual elements permit the use of strings (and symbols) as elements. SRFI-13 says they do not but I have found this feature to be natural, easy to implement, and useful. Tom Lord