Re: freshman-level Boyer-Moore fast string search Alan Watson 29 Jul 2005 15:20 UTC

William D Clinger wrote:
>  > Out of curiosity, what string representations does SRFI-75 penalize
>  > which you consider to be poor?
>
> Suppose each string s is represented by a vector of 2^21 elements,
> where element i consists of a list of numbers, in IEEE double
> precision, that represent the indexes within s at which the
> character c appears, where c is the Unicode scalar value f(i),
> where f is represented by a global association list that maps
> scalar values to indexes (i.e. f-inverse).  SRFI-75 allows that
> representation, yet penalizes it.  I also consider it to be a
> poor representation.

You consider this a poor representation because it does not allow
constant-time random access or because it does not allow linear-time
traversal? Or simply because of the space cost?

Consider an implementation that internally represents strings as if they
were doubly-linked lists and cached the position of the last index. This
allows linear-time traversal but not constant-time random access. Would
you consider this a poor implementation?

The thing is, SRFI-75 does not mention how one accesses the characters
in a string, so I presume that R5RS character indices will remain the
only portable way to do this.

Regards,

Alan
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México