Email list hosting service & mailing list manager

Round 3 discussion & open issues shivers@xxxxxx (22 Nov 1999 02:10 UTC)
Re: Round 3 discussion & open issues d96-mst@xxxxxx (24 Nov 1999 18:53 UTC)

Re: Round 3 discussion & open issues d96-mst@xxxxxx 23 Nov 1999 21:18 UTC

In article <199911220210.VAA04597@mongkok.ai.mit.edu>,
xxxxxx@ai.mit.edu wrote:

>On reflection, I can't think of cases where people are really going to
>be dying to use CAPITALIZE-STRING[!]. Should I bag it? Eric says yes.
>I am leaning that way. Further opinions?

I think that you should remove it.

>OK, backwards compatibility takes this round. STRING-FOR-EACH is
>left-to-right.  How about STRING-DO-EACH for the variant that doesn't specify
>order?

Do we really need such procedure?

>I could go with STRING-EMPTY? or STRING-NULL?. I'll stay with STRING-NULL?
>for the tiny tie-breaker reason that it's one column shorter. If I get more
>support for STRING-EMPTY?, I'll make the change. For now, I'll leave things
>as-is.

I think STRING-EMPTY? is better.

>-------------------------------------------------------------------------------
>    From: d96xxxxxx@d.kth.se (Mikael Ståldal)
>    I think that the KMP searching procedures should be generalized to
>    allow for other algorithms than KMP. But only to algorithms that have
>    the same properties as KMP, i.e. scanning a stream without
>    backtracking. This excludes Boyer-Moore, but allows for some recent
>    algorithms that are said to be faster than KMP[1].
>
>Here's the price of admission. Someone has to
>       - Send me working B-M Scheme code, under public domain, GPL,
>         Berkeley-style license, something along these lines.
>       - Explain in the comments or the email how B-M works, so I
>         can understand the code.
>Then I'll do it. I'll humbly admit: I originally considered what you propose,
>but really don't know beans about string-search algorithms, besides KMP. So I
>didn't do it.

I don't think it's nessesary to include anything else than KMP in the
reference implementation. But the interface should allow other
implementations to use other algorithms. BTW, Boyer-Moore cannot be use
here since it doesn't process the search text strictly sequencially.

>There are some problems with this spec. The search-step function is an
>inner-loop kind of utility. That's why I made people pass in the string
>explicitly with KMP-STEP -- you don't have to unpack it from some record
>structure,

But as I said, some algorithms doesn't need the pattern.

>and you can win better if you integrate/beta-substitute KMP-STEP.

What is beta-substitution?

>In spite of these tactical issues, your larger point is an excellent one.  So
>I'll tackle this generalisation, if I can -- but not until I see Boyer-Moore.

I really don't see how this issue is related to Boyer-Moore.

What about this:

make-search-object c= s [start end] -> opaque "search object"
    Returns an opaque "search object" that can be used to search
    for S in a stream/string. The search object includes S itself
	if nessesary (nessesary for KMP, but not for some other algorithms).

search-string search-object c= s [start end] -> bool or integer
	Search the string S using the search object, returning the index where
    the pattern is found or #f if it's not found.

search-stream search-object c= char-getter eos? -> bool
	Search a stream of characters using the search object,
    returning #t if the pattern could be found, #f otherwise.
    char-getter is a procedure that returns the next char from the stream,
    eos? is a predicate that indicate that the end of stream has been
    reached. If the pattern is found, no more character after the pattern
    is read.

Then all looping is encapsulated and can be properly optimized.

The Boyer-Moore algorithm is NOT suited for this, but can be used in
the STRING-CONTAINS procedure.