Re: Round 2 discussion

Re: Round 2 discussion Dan Bornstein 11 Nov 1999 07:51 UTC
Apologies that I haven't had time to formulate a better reply before now
(see previous excuse re: startup company and impending release that will be
eating most of my time till the end of the year), but I'll at least try to
quickly chime in on the issues:

Olin writes:
>C'mon. Do you really think that people would use STRING-SET ?
>STRING-FILL is an easier case to make. Let's see, that would be

Actually, my suggestions come from actual use. The Scheme variant that I'm
working on for work started out life as a functional-only system (that is,
no mutable data *at all*), and I ended up implementing string-set and using
it quite a bit. Do I have to rehash the issues of why working with
immutable data can be a big win?

Anyway, the straightforward implementation is simple:

    (define (string-set str k ch)
      (set! str (string-copy str)) ; or substring or whatever
      (string-set! str k ch)
      str)

and it (I know I harp on this) maintains the overall consistency of the
library. More consistency means easier to learn and easier to understand.
Big win.

I'd actually just as soon drop string-fill! as add string-fill (I don't
think I've ever had a compelling reason to use either), but I'm more in
favor of doing one or the other than leaving the asymmetry. For the
record:

    (define (string-fill str ch)
       (make-string (string-length str) ch))

>>[issue with string-copy and string-copy! not taking parallel args]
>Yeah, you're right. However, your non-side-effecting STRING-COPY is subsumed
>by the STRING-REPLACE Welsh proposes below. I think I'll leave things as-is.

If by "as-is" you mean dropping the proposal for string-copy! then I'm for
that. If you mean simply leaving your original proposal where the two
procedures take different sets of args, then I'm against that. Again, I'm
not against the particular functionality (which seems useful to me), just
against calling two essentially different procedures by essentially similar
names.

>I see your point... but I'm going to stick with <>. None of the other choices
>seem all that much better to me (and I don't have any better suggestions,
>myself).

I agree with using "<>"; the analogy with < and > works for me even though
some things are simply unordered (if it gets extended beyond strings, which
I'd expect).

>[mismatch index with the (in)equality procedures] It turns out to be a
>handy value to have around if you are comparing strings.

However, requiring it means that implementations are precluded from using
certain short-cut optimizations, in particular, = and <> can't return
quickly based on the length of the arguments. I'm against returning
mismatch indices in the standard (in)equality functions, but do see their
benefit and would be in favor of specifying explicit
mismatch-index-returning procedures, not just because of the above
efficiency tweak but also because they would signal programmer intent. I
don't have a strong opinion about what these functions would be named,
"stringOP-mismatch-index" is an off-the-top-of-my-head suggestion.

    string=-mismatch-index
    string<-mismatch-index
    etc.

>- STRING-APPEND accepts chars as well as strings?

I vote no, mainly because it makes typing (as in data types not fingers on
the keyboard) harder. (Waving hands slightly...) Keeping the args as
strings uniformly makes it easier to do sanity checking and some
optimizations. I would be in favor of a more generic display-append (bad
name, yeah) which string-appended the display forms for its arguments. This
would subsume a string-or-char variant. Assuming SRFI-6:

    (define (display-append . args)
      (define result-port (open-output-string))
      (do ((args args (cdr args)))
	  ((null? args) (get-output-string result-port))
	(display (car args) result-port)))

    > (display-append "foo" #\b #\a #\r "baz" 42)
    => "foobarbaz42"

>- Comparison functions n-ary?

I've never had a need for that, but I'm willing to be convinced. A
tentative no.

>- Include STRING-TOKENIZE?

Yes on string-tokenize, and I do also like the concept (suggested by
xxxxxx@pobox.com) of string-split, although I find the proposed definition
rather arcane (should be clearer when to expect delimiters to be ignored;
using '() to mean "whitespace-but-with-special-rules" seems like a bad
idea; etc.)

>- SUBSTRING and copying/shared-text semantics:
>  Liberal: Olin
>  Conservative: Egorov?, Bornstein?

Yep, conservative.

>- STRING-ITER vs STRING-ITERATE
>  Iter: Olin
>  Iterate: Egorov

I'm slightly more in favor of "iterate"; not a strong conviction, but I
do believe in autocompletion and creative use of newlines (i.e., so I don't
butt up against column 80 even with long names).

>- -COUNT versus -LENGTH
>  -COUNT:
>  -LENGTH: Egorov

Length.

Oleg writes:
>        -- procedure+: string->integer STR START END
>
>Makes sure a substring of the STR from START (inclusive) till END
>(exclusive) is a representation of a non-negative integer in decimal
>notation. If so, this integer is returned. Otherwise -- when the
>substring contains non-decimal characters, or when the range from
>START till END is not within STR, the result is #f.

I don't like this particularly. I can think of a kabillion variants on
parsing strings into numbers that I might find useful. The one that's
built-in is the right one since it's about Scheme read form (which you
gotta implement anyway). The moment you step into the territory of other
number formats, you should be ready to define a full suite of procedures to
deal with the plethora of possibilities.

I think that about covers things for me now.

Take care, all.

-dan