shared-text substrings, start/end indices, xs><, etc.
shivers@xxxxxx
(25 Jan 2000 01:02 UTC)
|
Re: shared-text substrings, start/end indices, xs><, etc.
sperber@xxxxxx
(25 Jan 2000 07:47 UTC)
|
shared-text substrings
shivers@xxxxxx
(25 Jan 2000 16:06 UTC)
|
Re: shared-text substrings
sperber@xxxxxx
(25 Jan 2000 16:54 UTC)
|
Re: shared-text substrings
shivers@xxxxxx
(25 Jan 2000 17:15 UTC)
|
Re: shared-text substrings
sperber@xxxxxx
(25 Jan 2000 17:54 UTC)
|
Re: shared-text substrings shivers@xxxxxx (25 Jan 2000 19:18 UTC)
|
Re: shared-text substrings
sperber@xxxxxx
(26 Jan 2000 07:30 UTC)
|
Re: shared-text substrings, start/end indices, xs><, etc.
d96-mst-ingen-reklam@xxxxxx
(28 Jan 2000 12:41 UTC)
|
I just doubt it's worth the tradeoff. Is there hard data that the optimizations you envision actually give significant performance gains? I've always found non-shared strings plenty fast. Let us suppose we read in a line of text, allocating a fresh string for it. We are working in a Scheme w/o shared-text substrings. We wish to trim whitespace from both ends of the string. Then we will throw away the original string, and further process the trimmed string. Let us further suppose that most of the time, strings do not have extra whitespace on the ends; we are trimming to ensure an invariant which is rarely broken by the input data. The string-trimming procedures are allowed simply to return the input string if they do no trimming. Requiring them to redundantly copy the data is bogus. It wastes space and it wastes time. It can waste a *lot* of space, if I'm processing a long, long file of data. You get into doing that, and suddenly you care that this extra copy happens. I've blown out my heap in S48 because of the *one* extra copy of a really big mother string (in cases where I read an entire file into a string, processed it with regexp stuff, and blatted output into a string port... which keeps lists of strings around, under the hood. Just one copy too many.) Copying data from one buffer to another is something that happens a lot in data processing. If you need a fresh copy, great. But if you are processing the data in a linear fashion, then you'd like to allow the underlying system to reuse storage. Which, you'll note from the above example, does *NOT* require shared-text substrings! In some cases, you can simply return the original string. This is not as generally applicable a win, but one that can be provided even in Schemes w/o shared-text substring support. My portable reference implementation of SRFI-13 has these optmisations built-in. Another example of this is STRING-APPEND/SHARED applied to some strings, all but one of which are empty. Or STRING-CONCATENATE/SHARED applied to a singleton list. When singleton lists are the common or important case, it's nice to be able to pick up the win, without having to lard your code up with explicit checks. I have code in scsh that does *exactly* this kind of thing. I have to throw in a lot of obfuscatory code, which special-cases the basic op. I'd love to be able to simplify that stuff, and be able to rely on the string system. I can't rely on having shared-text substrings -- S48 doesn't have them, and it would destroy any possibility of porting to other Schemes that lack them, in any event. You don't *have* to use these procedures. You can always, in a GC'd, functional language, play it safe and use pure functions. These procedures simply give you a portable way to write efficient code. I like to write efficient code. I have been burned using inefficient code. I didn't throw these procedures into the library from scratch, hoping they'd turn out useful. They arose from programming. -Olin