strings draft
Tom Lord
(22 Jan 2004 04:58 UTC)
|
Re: strings draft
Shiro Kawai
(22 Jan 2004 09:46 UTC)
|
Re: strings draft
Tom Lord
(22 Jan 2004 17:32 UTC)
|
Re: strings draft
Shiro Kawai
(23 Jan 2004 05:03 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 00:31 UTC)
|
Re: strings draft
Matthew Dempsky
(24 Jan 2004 03:00 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 03:27 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 04:18 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 04:49 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 18:47 UTC)
|
Re: strings draft
Shiro Kawai
(24 Jan 2004 22:16 UTC)
|
Octet vs Char (Re: strings draft)
Shiro Kawai
(26 Jan 2004 09:58 UTC)
|
Strings, one last detail.
bear
(30 Jan 2004 21:12 UTC)
|
Re: Strings, one last detail.
Shiro Kawai
(30 Jan 2004 21:43 UTC)
|
Re: Strings, one last detail. Tom Lord (31 Jan 2004 00:13 UTC)
|
Re: Strings, one last detail.
bear
(31 Jan 2004 20:26 UTC)
|
Re: Strings, one last detail.
Tom Lord
(31 Jan 2004 20:42 UTC)
|
Re: Strings, one last detail.
bear
(01 Feb 2004 02:29 UTC)
|
Re: Strings, one last detail.
Tom Lord
(01 Feb 2004 02:44 UTC)
|
Re: Strings, one last detail.
bear
(01 Feb 2004 07:53 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(26 Jan 2004 19:04 UTC)
|
Re: Octet vs Char (Re: strings draft)
Matthew Dempsky
(26 Jan 2004 20:12 UTC)
|
Re: Octet vs Char (Re: strings draft)
Matthew Dempsky
(26 Jan 2004 20:40 UTC)
|
Re: Octet vs Char
Shiro Kawai
(26 Jan 2004 23:39 UTC)
|
Re: Octet vs Char (Re: strings draft)
Ken Dickey
(27 Jan 2004 04:33 UTC)
|
Re: Octet vs Char
Shiro Kawai
(27 Jan 2004 05:12 UTC)
|
Re: Octet vs Char
Tom Lord
(27 Jan 2004 05:23 UTC)
|
Re: Octet vs Char
bear
(27 Jan 2004 08:35 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(27 Jan 2004 08:33 UTC)
|
Re: Octet vs Char (Re: strings draft)
Ken Dickey
(27 Jan 2004 15:43 UTC)
|
Re: Octet vs Char (Re: strings draft)
bear
(27 Jan 2004 19:06 UTC)
|
Re: strings draft
bear
(22 Jan 2004 19:05 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 01:53 UTC)
|
READ-OCTET (Re: strings draft)
Shiro Kawai
(23 Jan 2004 06:01 UTC)
|
Re: strings draft
bear
(23 Jan 2004 07:04 UTC)
|
Re: strings draft
bear
(23 Jan 2004 07:20 UTC)
|
Re: strings draft
Tom Lord
(24 Jan 2004 00:02 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 01:59 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 02:22 UTC)
|
Re: strings draft
bear
(26 Jan 2004 02:35 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 02:48 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 03:00 UTC)
|
Re: strings draft
Tom Lord
(26 Jan 2004 03:14 UTC)
|
Re: strings draft
Shiro Kawai
(26 Jan 2004 04:57 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 04:58 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 18:48 UTC)
|
Re: strings draft
bear
(24 Jan 2004 02:21 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:10 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:29 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:44 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:53 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:04 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 03:16 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:42 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 02:35 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 02:42 UTC)
|
Re: strings draft
Tom Lord
(23 Jan 2004 02:49 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 02:58 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:13 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 03:19 UTC)
|
Re: strings draft
Bradd W. Szonye
(23 Jan 2004 19:31 UTC)
|
Re: strings draft
Alex Shinn
(26 Jan 2004 02:22 UTC)
|
Re: strings draft
Bradd W. Szonye
(06 Feb 2004 23:30 UTC)
|
Re: strings draft
Bradd W. Szonye
(06 Feb 2004 23:33 UTC)
|
Re: strings draft
Alex Shinn
(09 Feb 2004 01:45 UTC)
|
specifying source encoding (Re: strings draft)
Shiro Kawai
(09 Feb 2004 02:51 UTC)
|
Re: strings draft
Bradd W. Szonye
(09 Feb 2004 03:39 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:12 UTC)
|
Re: strings draft
Alex Shinn
(23 Jan 2004 03:28 UTC)
|
Re: strings draft
tb@xxxxxx
(23 Jan 2004 03:44 UTC)
|
Parsing Scheme [was Re: strings draft]
Ken Dickey
(23 Jan 2004 17:02 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
bear
(23 Jan 2004 17:56 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(23 Jan 2004 18:50 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Per Bothner
(23 Jan 2004 18:56 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 20:26 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Per Bothner
(23 Jan 2004 20:57 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 21:44 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Ken Dickey
(23 Jan 2004 21:47 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 23:22 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Ken Dickey
(25 Jan 2004 01:03 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(25 Jan 2004 03:01 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 20:07 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(23 Jan 2004 21:22 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(23 Jan 2004 22:38 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(24 Jan 2004 06:48 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(24 Jan 2004 18:41 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
tb@xxxxxx
(24 Jan 2004 19:34 UTC)
|
Re: Parsing Scheme [was Re: strings draft]
Tom Lord
(24 Jan 2004 21:48 UTC)
|
Re: strings draft
Matthew Dempsky
(25 Jan 2004 06:59 UTC)
|
Re: strings draft
Tom Lord
(25 Jan 2004 07:16 UTC)
|
Re: strings draft
Matthew Dempsky
(26 Jan 2004 23:52 UTC)
|
Re: strings draft
Tom Lord
(27 Jan 2004 00:30 UTC)
|
> From: bear <xxxxxx@sonic.net> > I can treat SCHEME_EXTRACT_STRING as a request to make a copy of > a string in a format acceptable to C code in some buffer, > register it as "floating garbage" with the GC, return a pointer > to that buffer, and lock the garbage collector until the scheme > runtime is reentered. If the C code just wants to _read_ what's > there, that's necessary and sufficient. [....] > What's missing is an explicit declaration that it is unspecified > whether or not values written into the buffer pointed at by the > result of SCHEME_EXTRACT_STRING mutate the scheme string that > was originally referred to, Interesting conlusion. I conclude that EXTRACT must allocate string data which the C code must explicitly free. I arrived at this through a fairly systematic exploration of the design space (described below). To motivate my design space exploration, let's observe that the draft currently says: char * SCHEME_EXTRACT_STRING(scheme_value) scheme_value SCHEME_ENTER_STRING(char *) (may GC) SCHEME_EXTRACT_STRING returns a pointer to the actual storage used by the Scheme string. If this is the case[sic], the pointer is valid only until the next garbage collection. Note that this string may not be null-terminated; SCHEME_STRING_LENGTH returns the number of characters in the string. Does SCHEME_STRING_SET! modify the data that C is seeing? It would seem so from "returns a pointer to the actual storage[...]" but in Pika, STRING-SET! can relocate a string (as when promoting an 8-bit to a 16-bit string representation). UTF-8 implementations will sometimes relocate strings when replacing a character with a character of a different length encoding. We have one design question with four possible answers: 1) Is the extracted data shared with Scheme? 1a) Yes, for reading and writing 1b) Yes, for reading 1c) Unspecified 1d) No and there is a dependent design question with six likely answers: 2) Must C code "explicitly free" extracted string data? 2a) Yes, using "free()" 2b) Yes, using whatever function goes with the allocation function that C passes as a parameter to EXTRACT 2c) Yes, using "SCHEME_STRING_DATA_FREE()" which is up to the FFI implementor to provide. 2d) No, it's lifetime is that of the Scheme string 2e) No, it's lifetime is up until the next GC point 2f) No, it's lifetime is up until the next execution of any FFI function which might mutate the string, including by GC. So we start with 24 possible designs (4 * 6). If you want to follow the arguments I give below carefully, I suggest that you get out some graph paper and make a table: 2a 2b 2c 2d 2e 2f 1a | | | | | ---|----|----|----|----|---- 1b | | | | | ---|----|----|----|----|---- 1c | | | | | ---|----|----|----|----|---- 1d | | | | | putting X's in boxes when you agree my arguments eliminate some possible design and ?'s in the one's where you think my arguments are bogus. I'm going to try to argue you down to 23 boxes with X's and one left blank -- a "first-principles" string FFI. (I expect that most people who actually do this will come up with some question marks in their graph --- but those will be handy for organizing any subsequent discussion.) Not all 24 possibilities are coherent. We can right-away eliminate: 1a + 2a 1a + 2b All would involve C code using a 1b + 2a non-FFI function to "free" string 1b + 2b data that does or might belong to 1c + 2a a Scheme object. 1c + 2b leaving 18. We can cut out all 4 possibilities that involve (2d) because I think everyone agrees that that unduly restricts GC and string representations. For example, GC would be forbidden from relocating string data if that string data might ever have been EXTRACTed by C. That leaves 14. If strings are _not_ shared (1d), then surely string-data lifetime must be explicitly managed (not 2e, 2f), leaving 12 designs. In some quite plausible string representations (UTF-8, UTF-16, Pika's) mutation to a string can change it's length. Changing a string's length means it's location in memory can change. There may be hairy work-arounds but I think that these are reasons enough to eliminate (1a+2c) because before the explicit free function is called the string may be mutated by Scheme. (1b+2c) for the same reason (1a+2e) because a string mutation between GC-points can change the string's length, hence location (1b+2e) for the same reason leaving 8 (1b+2f) has absolutely no advantage over (1c+2f). From the point of the FFI-user, they are operationally equivalent. From the point of view of the FFI-implementor, (1c+2f) leaves more implementation options. That leaves 7. Similarly, (1d+2c) has absolutely no advantage over (1d+2a) or (1d+2b). If strings are _not_ shared and C must free them, then use either free() or let the C code specify how they are allocated in the first place. There's no reason why the FFI should define how non-shared string data is freed. That leaves 6. These are: 1a+2f) r/w sharing, Scheme-mutation-bound lifetime 1c+2c) unspecified sharing, FFI free function 1c+2e) unspecified sharing, GC-point lifetime 1c+2f) unspecified sharing, Scheme-mutation-bound lifetime 1d+2a) no sharing, use free() 1d+2b) no sharing, C controls allocation and freeing I would argue that (1d+2b) is preferable to (1d+2a) because nothing else in the FFI already depends on malloc()/free(). (One could make the opposite decision, that 1d+2a is preferable to 1d+2b and the rest of this would still apply.) So that leaves 5: 1a+2f) r/w sharing, Scheme-mutation-bound lifetime 1c+2c) unspecified sharing, FFI free function 1c+2e) unspecified sharing, GC-point lifetime 1c+2f) unspecified sharing, Scheme-mutation-bound lifetime 1d+2b) no sharing, C controls allocation and freeing There is nearly no advantage to (1c+2e) compared to (1c+2f). In (1c+2e), if a C program crosses a Scheme-mutation-point between GC-points, while it can assume that the pointer to the string data remains valid, it can make no assumptions about the contents of that string data. This isn't an absolute refutation of (1c+2e) but I would argue that it is near enough as to make for nevermind. An analogous argument applies to (1c+2c) compared to (1c+2f). If C passes a Scheme-mutation point before calling the FFI free function, the data pointer may remain valid but it's contents are unspecified. Leaving: 1a+2f) r/w sharing, Scheme-mutation-bound lifetime 1c+2f) unspecified sharing, Scheme-mutation-bound lifetime 1d+2b) no sharing, C controls allocation and freeing (1a+2f), Scheme-mutation-bound r/w sharing, must be rejected as well. This is because it constrains implementations to represent strings internally in the exactly the same format seen by C -- because between Scheme mutation points, C may modify the string and that should be apparent to Scheme code that _reads_ the string. That leaves: 1c+2f) unspecified sharing, Scheme-mutation-bound lifetime 1d+2b) no sharing, C controls allocation and freeing (1c+2f) requires us to make a distinction between functions in the FFI similar to, but not necessarily identical to the "may GC" distinction. It requires us to distinguish "may mutate a string" functions. Is there _any_ function in the FFI that we would not want to put in the "may mutate a string" category? I'm not so sure that there is. In an Oaklisp-style implementation, for example, every FFI function can result in the execution of arbitrary Scheme code. Therefore, I think we can rephrase our remaining choices as: 1c+2f) unspecified sharing, data lifetime bound by next FFI call 1d+2b) no sharing, C controls allocation and freeing yet that would make even this simple FFI code _incorrect_: /* Incorrect code: */ s1 = STRING_EXTRACT_STRING (scheme_s1); s2 = STRING_EXTRACT_STRING (scheme_s2); That every FFI function should be "may mutate a string" is, I think, controversial but not dismissable. So let's call this 33% of a reason to reject (1c+2f). The benefit of (1c+2f) compared to (1d+2b) is that it doesn't _require_ allocation and copying of string data -- a potential performance benefit that will be available to _some_ implementations. But it is certain that _many_ (not all) uses of EXTRACT will be in a context in which the potential performance benefit does not apply because C will the string data lifetime to cross string mutation boundaries. In those many cases, the C code will have to allocate space and copy the string data anyway, eliminating the performance advantage. So let's call this another 33% of a reason to reject (1c+2f). Regardless of what _this_ SRFI does -- I think it certain that sometime in the future we will want a portable FFI which permits (in some form) r/w sharing of string data under constrained conditions. The "internal intefaces for Pika" that I posted earlier are a good example of what I think this should also look like in a portable FFI. The future appearence of those functions is not guaranteed (but not unlikely) -- and such appearence will eliminate nearly all remaining benefits of (1c+2f). Can we call this 34% of a reason? So I think the choice is clear: 1d+2b) no sharing, C controls allocation and freeing That that answer is _also_ compatible with a GC-anytime and async/concurrent-Scheme-code-permitted FFI is just a happy non-coincidence. -t