Substring indices everywhere? Eating a cake without destroying it
oleg@xxxxxx 31 Dec 1999 21:26 UTC
As the recent discussion (in particular, Tom Lord's message)
indicated, the pervasive use of substring indices in SRFI-13 is
controversial. There appears however to be a way to have the best of
both approaches.
Consider procedures
string= SMTH1 SMTH2
string-pad SMTH k [char]
string-prefix? SMTH1 SMTH2
string-tokenize SMTH [token-set]
string->number SMTH [base]
etc.
SMTH may be a string value. In that case, string= is equivalent to
R5RS string=?; The meaning of other procedures is obvious. The
argument list is simple and concise.
However, SMTH may also be a form:
(XS>< STR BEG-INDEX) or
(XS>< STR BEG-INDEX END-INDEX)
where END-INDEX is assumed to be (string-length STR) if omitted.
Thus we can write:
(let ((str "foobar") (foo "foo"))
(display (string= str foo))
(display (string= (XS>< str 0 3) foo))
(display (string= (XS>< str 3) (XS>< foo 0))))
(string->number (XS>< "$12345.99" 1))
etc.
What exactly is the XS>< form? It's up to an implementation. One
Scheme system may choose to implement (XS>< str ind1 ind2) as
(substring str ind1 ind2). This is the easiest (albeit not very
efficient) approach. In this case, (XS>< str ind1 ind2) is a real
string, so we can use R5RS string->number, string=?, etc. procedures
as they are.
(XS>< str ind1 ind2) may also be a shared substring, should a
particular Scheme system support such things. (XS>< str ind1 ind2) may
also be a lazy substring, implemented as
(vector 'lazy-subst-tag str ind1 ind2)
or with records, or even as a distinct datatype. Note that I have NOT
said that (XS>< str ind1 ind2) is a procedure and its result is a
first-class value. I don't want to commit that (XS>< str ind1 ind2)
may meaningfully be used outside of string functions. The only promise
I'd like to make is that (XS>< str ind1 ind2) _denotes_ a substring
when used within a string function. It appears that such a limited
promise makes even a shared substring implementation of the XS>< form
transparent to the user. The XS>< form seems to answer all the
concerns Tom Lord had about pervasiveness of substring indices. At
the same time it preserves the spirit of Olin's library.
Happy Y2K!
PS. It may make sense to allow indices in the XS>< form take negative
values as well. If an index is negative, the length of the string
should be added to it implicitly. For example, (XS>< str -3) would
mean the last three characters of str. This convention is supported in
Perl and Python, and appears rather useful.