Email list hosting service & mailing list manager


Re: Surrogates and character representation Alex Shinn 28 Jul 2005 14:48 UTC

On 7/28/05, William D Clinger <xxxxxx@verizon.net> wrote:
>
> I acknowledged that the algorithm will still work.  My point
> is that its asymptotic complexity may be degraded.

I'm sorry, perhaps I'm just misunderstanding, but if the exact
same algorithm, in fact the exact same code, can be used,
how is the asymptotic complexity affected?

> > It is in fact UTF-32 that has additional overhead for Boyer-Moore
> > searches as mentioned in my previous mail.
>
> Your claim was incorrect.  Your claim appears to depend upon
> ignorance of techniques for constructing, representing, and
> manipulating sparse automata efficiently, and upon your mistaken
> belief that c1 > c2 implies O(m+c1) > O(m+c2), where c1 and c2
> are constants.  Finally, you ignored the difference between O(n)
> and O(n/m), which was the point of the paragraph you quoted above.

Please note that I said "additional overhead."  Nowhere did I say
or suggest that UTF-32 incurred an asymptotic penalty.  However,
the c1 and c2 are not just constants - they are typically larger than
m, and therefore should not be ignored.

--
Alex