Re: Surrogates and character representation
Tom Emerson 27 Jul 2005 15:54 UTC
William D Clinger writes:
Per Bothner wrote:
> Random accesses to a position in a string that has not
> been previously accessed is not in itself useful.
In computational linguistics it is common to utilize standoff markup,
where features in a text are tagged in a separate file via character
ranges into the original. For example, we may have a file indicating
that certain prepositional phrases appear at offsets [25,40) and
[125,160) in the original file. I'm regularly dealing with
multimegabyte text files with such standoff markup and not having
random access is a detriment in these applications.
--
Tom Emerson Basis Technology Corp.
Software Architect http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"