> From: Michael Sperber <xxxxxx@informatik.uni-tuebingen.de>
[re a prototype of the draft FFI being used in rscheme]
> Tom> Not all incremental collectors are incompatible with the FFI (a mostly
> Tom> copying semi-conservative incremental GC would be one example).
> This was actually a precise non-copying incremental GC.
I didn't mean to imply otherwise -- only to ack that it's obvious the
FFI isn't incompatible with _all_ possible implementations with
incremental collectors.
Anyway, as I said in the "Pika from first principles" message, perhaps
the better way to motivate changes to the draft is not to build a big
catalog of every GC technique we can think of, but simply to observe
that it fails to fully abstract over the representations of Scheme
values, locations and the implementations of primitive operations
on the Scheme store.
[earlier in the same message]
> Tom> If the root set is large, certainly it should be traced in several
> Tom> steps, using barriers to preserve its invariants.
>>> Is there a practical example of a system that does this? It seems
>>> very difficult to do, even absent an FFI to C, as your typical root
>>> set---the current continuation---changes *all the time*. (I'm really
>>> curious. I could never wrap my mind around this.)
Tom> You can treat the "big-three abstract registers" (continuation, code,
Tom> and environment) specially. They have usefully limited usage
Tom> patterns. It's the other roots, if your implementation has them,
Tom> that are of greater interest. (The draft FFI creates "other roots".)
> That isn't the question I asked. All hard questions are buried behind
> "specially."
You're asking how I plan to treat the big-three registers in Pika?
Similarly to SCM or the "phantom stacks" approach.
$code presents no special problem: the values it points to aren't
subject to mutation and it is, indeed, an example of the kind root you
were thinking of (one that the collector must scan atomically with the
last step of a trace cycle).
$environment and $continuation very often point to locations which are
reachable _only_ from those registers. It's possible to use
basically a separate allocator and collector for these.
Ignoring optimizations, the basic idea is: Stack allocate environments
and continuations. When the stack is exhausted, copy-collect them (or
otherwise move them) to the general Scheme heap (most often, few will
live during this collection). If they are captured (by closure
formation or continuation capture), collect them to the general
Scheme heap at that time.
While they live on the stack, the only references to these objects are
rooted in $continuation and $environment and follow their respective
chains. The general collector can regard references from the stack to
the heap as GC roots -- but there's no reason it can't trace those
incrementally. (The hypothesis being that environments and
continuations tend to die very quickly on average, many will be
created, be used, and die without the tracer seeing them at all.)
Meanwhile, because the general purpose tracer is unconcerned with
collecting the continuation and environment objects still living on
the stack, mutations to environments on the stack don't need to
preserve a tri-color invariant. (The hypothesis being that a
significant fraction of reads and writes are to and from environments
that will be on the stack rather than the general heap, there will be
a significant fraction of reads and writes that mostly skip the
barriers.)
There's a small catch: if you have some single sequence of code
(compiled code or part of the guts of your interpreter) and you don't
know a priori whether, when that code is executed, $environment and
$continuation will point to the stack or to the general heap, then
that code _does_ have to be prepared to preserve the tri-color
invariant during environment mutation. However, it can be a pretty
cheap test to decide when you're reading/writing the stack vs.
the general heap.
-t