Re: freshman-level Boyer-Moore fast string search
Alan Watson 29 Jul 2005 15:20 UTC
William D Clinger wrote:
> > Out of curiosity, what string representations does SRFI-75 penalize
> > which you consider to be poor?
>
> Suppose each string s is represented by a vector of 2^21 elements,
> where element i consists of a list of numbers, in IEEE double
> precision, that represent the indexes within s at which the
> character c appears, where c is the Unicode scalar value f(i),
> where f is represented by a global association list that maps
> scalar values to indexes (i.e. f-inverse). SRFI-75 allows that
> representation, yet penalizes it. I also consider it to be a
> poor representation.
You consider this a poor representation because it does not allow
constant-time random access or because it does not allow linear-time
traversal? Or simply because of the space cost?
Consider an implementation that internally represents strings as if they
were doubly-linked lists and cached the position of the last index. This
allows linear-time traversal but not constant-time random access. Would
you consider this a poor implementation?
The thing is, SRFI-75 does not mention how one accesses the characters
in a string, so I presume that R5RS character indices will remain the
only portable way to do this.
Regards,
Alan
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México