me> Ok, I don't care about this srfi anymore. Somebody asked me offlist not to give up on this srfi and it got me thinking that that was a pretty bitchy comment for me to make. Sorry. Let me rephrase. The thing that set me off was this comment: > From: Richard Kelsey <xxxxxx@s48.org> > Within the context of this SRFI I don't care about asynchronous > execution (I do care about threads, but as long as C code is not > interrupted asychronously they don't matter). The GC protection > mechanism described in the SRFI is inadequate if GCs can occur > asynchronously. Even if variables 'a' and 'b' are protected, > there is no guarentee that a = b; will not temporarily store the > value of 'b' in some unprotected location such as a register. which I perhaps misinterpreted to mean that there is in effect a final decision by the authors not to change the calling conventions to either the JNI/minor-style or Pika-style conventions that jimb and I proposed. It would be a mistake, in my opinion, to finalize the FFI with the current calling conventions in place --- to the same extent that such a finalized FFI became popular I would feel discouraged from calling my implementation-in-progress, Pika, "Scheme". Instead of "Pika Scheme" I'll wind up with "Pika WhatSchemeShouldBeInsteadOfWhatItIs". Let me try to begin to restate the issues that lead me to the conclusion that the current calling conventions in the draft are not acceptable. If I can convince you of these issues, then I'll be happy to move on to reasons why Pika-style is better than JNI/minor-style. * Foreign Functions Extend the Root Set A useful way to look at the GC situation is that foreign functions extend the root set that the GC must honor. The Scheme valued parameters, locals, intermediate results and return values that arise during the evaluation of a foreign function must be GC protected for a dynamic extent equal to the "liveness" of those values in the C code. Similarly, Scheme values stored in C global or heap allocated C data structures are part of the root set. What does "part of the root set" mean? Let's explore: * Transient Root Values One possibility is that _some_ root set values are "transient" -- we try to design an FFI in which those values are only live during periods of time during which the GC is guaranteed not to run. The draft proposal takes this approach with respect to return values and parameters of FFI entry points. For example, the draft allows code like: x = SCHEME_CONS (a, b); The return value of SCHEME_CONS, before it is stored in the (presumably GCPROtected) variable x exists only as an intermediate result -- it's in some register or on the stack in some random location or is in some random static area. No non-conservative GC could find (and protecct) that intermediate result. No conservative GC could protect it in a portable way. No fully (rather than "mostly") copying GC could cope with that intermediate result _at_all_. Therefore, the draft proposal relies on GC being excluded between (roughly speaking) the execution of the `return' instruction of SCHEME_CONS and the completion of the `store' instruction of the assignment to x. In a single-threaded system that makes only inconsequential use of signal handlers, excluding GC for that period of time is guaranteed. There are simply no other instructions in that path that could trigger GC. The intermediate value, the return value of SCHEME_CONS, is safe not because we've pointed it out to the GC -- but because we're certain that the GC is quiescent during the life of that intermediate value. From the GC perspective, a "transient root set member" like this is a bit like Schroedinger's cat: the right answer isn't "live" and isn't "dead" but more like "what do you care? it has no causality relationship to you -- it's in another universe entirely. Assume "live" _or_ "dead" -- it will make no difference in your observable results." In a multi-threaded system, the situation is not so convenient. The schroedinger box leaks. There are two possibilities: (a) Another thread can be running GC at _any_ time -- in this case, our intermediate result is _not_ GC protected, _is_ in danger of being collected, and the variable x may wind up holding a bogus value; (b) Other threads can be running GC _only_ when _all_ threads are either not executing a foreign function or are at a point during the execution of a foreign function when GC is permissable. The draft standard clearly requires (b) (since (a) simply means that the FFI is not reliable). But (b) is really quite unacceptable because it means that GC activity is permitted only while foreign functions are not executing or are executing certain FFI calls. A thread running a foreign function that tries to make a network connection or calls ackerman's function or whatever can block GC _indefinately_ using only the most innocent seeming code. (Imagine your multi-threaded browser crashing or else flushing all other processes from RAM while one browser tab is waiting to connect to yahoo.com. The failure occurs because a foreign function is taking "too long" to complete.) So, for threaded systems and, equivalently, for systems that do interesting things from signal handlers: the current draft is no good. It can require FFI implementations to block GC unreasonably. It can result in FFI-using C libraries that simply don't work well with some implementations. It gets worse, though. Let's consider just _single_threaded_ systems which _do_not_ do anything from signal handlers. The current FFI invites people to write code like: foobar (SCHEME_CONS (a, b), SCHEME_CONS (c, d)); Note that in _this_ example, the return values of the two calls to SCHEME_CONS are transient root set members for the _entire_ dynamic extent of the call to `foobar' unless `foobar' copies them to GCPROtected locations. Should `foobar' do anything that might trigger GC, those parameters can be collected prematurely. Even worse, there is nothing the second-to-be-executed CONS call can do to protect the return value of the first-to-be-executed CONS call. Now I understand that, if we're focused only on single-threaded implementations that don't use signal handlers in interesting ways that we can just say, "Well, that code is an error. The programmer made a mistake. He should have written:" x = SCHEME_CONS (a, b); y = SCHEME_CONS (c, d); foobar (x, y); That's true but with a Pika or JNI/minor-style calling convention, such mistakes are _effectively_impossible_to_make_. With the draft proposal conventions, such mistakes are _easy_and_inviting_to_make_. Aside from the discussion of transient roots, what else can we say about the root values added by FFI-using functions: ** Root Set Values Must be Known to the GC The GC must be able to enumerate all root set values. That much should be pretty obvious. The C language presents a problem for enumeration of root values: it provides no reliable implicit mechanism for finding, for example, local variables and the intermediate results of expressions. It's therefore required that the FFI involve foreign function authors writing code to explicitly make these values enumerable by the GC. The draft proposal takes the approach of providing GCPRO and GCUNRO entry points and in that regard is fine. ** Root Set Values Must be Mutable by the GC Copying collection (and most likely some other techniques as well) require not only that the GC be able to enumerate all root-set values, but also that it be able to _replace_ such values. This is a thorny problem for C programs. The compiler is permitted in a wide and difficult to reason about set of circumstances to copy values to registers. Consider, for example, a function: scheme_value foo (scheme_value x) { ....; <something that can cause gc>; ....; return x; } With a copying GC, this function can return garbage. The value of x, though somehow presumably GC-protected by callers, has been copied to a register to be passed as a parameter. When the value referred to by x is moved the caller's protecting variable will be updated but the copy in the register will not. (This particular problem could, perhaps, be solved by adding a requirement to the FFI that callees GCPRO parameters but recall that we already have other reasons to change the calling conventions such that no such requirement is needed. Also, with such a requirement, the FFI would be asking people to write the somewhat perplexing: scheme_value foo (scheme_value x) { GCPRO (X); ....; <something that can cause gc>; ....; GCUNPRO (x); return x; } which aside from just the strangeness of it, precludes a debugging aid in which GCUNPRO overwrites the value of its argument variable with a value likely to cause a trap.) ** Root Set Values Need (Potential) Read and Write Barriers Generational, copying, or incremental GCs sometimes operate on the basis of read or write barriers. Such barriers should apply to the FFI-caused root-sets too. For that reason alone, an FFI interface that permits something like: scheme_value x = some_fn (....); SCHEME_FN (x, other_fn (...)); (where, even if we make the corrections suggested above, `other_fn' might be producing a value which is _not_ a scheme value.) Here, x has been read without a read barrier. Although x may refer to a heap-allocated value which has been relocated since x was assigned to, what we have produced here is (in effect) the _old_ address of the scheme value to which x is bound. When control reaches SCHEME_FN, this is not obviously a problem. For example, when SCHEME_FN reads the value of its first parameter, it can itself impose a read barrier. Although `x' itself is pointing to the old address -- nevertheless, SCHEME_FN can correctly trap that and substitute the new address. The problem concerns what happens _before_ control reaches SCHEME_FN but _after_ x has been read by the caller. In particular, the execution of `other_fn' may take place, and may take an aribtrarilly long time. For that reason, the _unupdated_ (old address) value of `x' may persist in a register, on the stack in an anonymous location, or really _anywhere_ the GC doesn't know about for an _indefinate_ amount of time. Should `other_fn' itself trigger GC activity in a single threaded system, or should GC activity occur in another thread or in a signal handler -- we're screwed: the GC can not actually reclaim the _old_ heap space used for the value referred to by x. Ever. Because the C compiler may still be hanging on to references constructed from that old heap address. ** Summing Up The draft proposal: ~ has an unacceptable approach to transient root set members in threaded systems and systems in which asynchronous activity takes place ~ has an error-encouraging approach to transient root set members in single threaded systems ~ has an unacceptable approach from the perspective of copying GCs ~ has an unacceptable approach from the perspective of GCs using read or write barriers * Fixing It All of the problems with the draft proposal have a single root cause: that the FFI contains features which permit and require scheme values to be produced as intermediate results of FFI-using C expressions. The solution is, therefore, to change the calling conventions in the FFI so that scheme values as intermediate results are not required and, indeed, are not even permitted. That solution implies that FFI parameters, return values, and expression values may not be scheme values -- it requires a change in the calling conventions. There may be other ways to implement that solution but I admit I only know of one reasonable way: to pass parameters and return values using not scheme values, but references to locations in which scheme values are stored. Let's call this "the handle based solution". And again, perhaps because of a failure of imagination, I see only two broad classes of ways to implement the handle based solution: One way is to let FFI-using functions manage the allocation of handles: the Pika approach. The other way is to have the FFI entry points allocate handles but require FFI-using functions to free them: the JNI/minor approach. I can argue for the Pika approach over the JNI/minor approach -- but let's sync up in recognizing the problem first. -t