Re: extending the discussion Dan Bornstein 21 Dec 1999 03:40 UTC
I agree with most of what Tom wrote, and don't disagree enough with most of the rest to feel it worth commenting on. However, I do have a strong opinion about one part, which I mostly disagree with: Tom Lord: > * In a SRFI which defines "substring/shared", it should be > mandatory that the string returned from that procedure share > state with the primary string argument. This part, I agree with. It seems silly to me to explicitly expose this sort of mechanism and expect people *not* to start relying on it. > Two additional procedures are desirable: > > shared-substring? obj => boolean > > which tells whether a particular string is a shared > substring I'm don't understand why this procedure is important or useful. I also imagine it would be hard to implement in a meaningful way. For example, if I call (substring/shared <string> 0 (string-length <string>))--or something less obvious with the same result--does the result have to return #t for shared-substring? If so, that seems to mean that either you can't return <string> itself for that case (seems like a bad idea to me) or that all strings need to have a marker to indicate that substring/shared was called on the string with the entire string as the result (also seems like a bad idea to me); if not, then how do you know that the "parent" string is, at the programmer's level of abstraction, shared (which is presumably what this call is about)? Also, if I call substring/shared on a string and then that string later becomes garbage, does the result of shared-substring? change for the substring that is now no longer shared? If not, what does it *really* mean to be a shared substring? How about the case where two non-overlapping shared substrings get created from a common parent and then the parent gets reclaimed? Another issue is how this might be extended, meaningfully, to the other cases where sharing is defined, such as in string-append/shared, which is, in a sense a complementary function to substring/shared. That is, you can end up with what would seem to be two functionally identical pairs of objects through either of these sets of calls: (define foobarbaz (string-copy "foobarbaz")) (define bar (substring/shared foobarbaz 3 6)) (define bar "bar") (define foobarbaz (string-append/shared (string-copy "foo") bar (string-copy "baz"))) In the second situation, should (shared-substring? bar) return #t? If so, it seems to imply a lot of extra mechanism to figure out that fact (seems like a bad idea to me). If not, it seems like shared-substring? has to lie some of the time since, at that point, bar really is a shared substring of foobarbaz (also seems like a bad idea to me). > containing-string string => string start end > > which converts a shared substring to its parent string and > indexes, and an ordinary string to itself, 0, and its > length. I have a big problem with this, from a security standpoint. Having this functionality makes it easy to make the mistake of passing around more than you intended, which is especially a problem in situations where you don't necessarily trust the code that you're running (either because it might be malicious or because it was coded incompetently). Basically, it makes it too easy to inadvertently and non-obviously share data. The canonical example: assume s = "Userid: danfuzz\nPassword: fuzzball187\n" (define userid (substring/shared s 8 15)) ... lots of random code ... (malicious-procedure-masquerading-as-something-innocent userid) ...and then malicious-procedure goes on to call containing-string on userid and extract the password. Also, just from the efficiency perspective, making a backlink from a shared substring to its parent visible to the programmer means that you can't reclaim the storage for a (possibly much larger) parent until all of its shared-substring children become garbage. This sort of pattern is common, e.g., parsing out and keeping references to only the "interesting" bits of a long string (e.g., read in from a file). In that situation, with containing-string, the parent string will be unreclaimable deadweight. -dan