Re: Surrogates and character representation Tom Emerson 27 Jul 2005 15:54 UTC
William D Clinger writes: Per Bothner wrote: > Random accesses to a position in a string that has not > been previously accessed is not in itself useful. In computational linguistics it is common to utilize standoff markup, where features in a text are tagged in a separate file via character ranges into the original. For example, we may have a file indicating that certain prepositional phrases appear at offsets [25,40) and [125,160) in the original file. I'm regularly dealing with multimegabyte text files with such standoff markup and not having random access is a detriment in these applications. -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"