>You can't use Boyer-Moore with large character types -- it requires you
>to build a table with one entry for every possible character. Hence
>not really useable for anything past Latin-1.
>
>In fact, I have an implementation of B-M. I just don't *export* it into
>the library's API as such, because it isn't portable across character types,
>which is one of the design criteria of the lib.
But my opinion is that the SRFI should not mention ANY algorithm, it
should leave it up to implementors. A SRFI is a specification, not an
implementation, isn't it?
Good question. The specific features of an algorithm can certainly show up in
a specification. Tailoring a spec to a particular algorithm allows the user to
rely on specific performance guarantees of the algorithm. Let's take the case
of KMP. If you know it's KMP, for example, then you know that it won't
allocate temporary storage proportional to the size of the character type.
This is a good thing to know if you'd like your code to run in a Unicode
environment, eh?
SRFI-13 specs string search in two levels. One level is the more abstract
you seem to prefer -- STRING-CONTAINS? and STRING-CONTAINS-CI?. These
functions are free to use any algorithm deemed appropriate by the
library implementor.
The second level is the more algorithmic-specific KMP routines. These give you
extra features that are KMP-specific, and so have a more detailed API. Note
that the thing that lets me export the KMP API is that it doesn't conflict
with the design criteria of the library, e.g., it's portable across different
character types (unlike Boyer-Moore or Sunday's algorithm).
You use the interface that's appropriate to the task you have.
-Olin
P.S. Note that an *algorithm* is not an *implementation*.