Re: no constants please Tom Lord 04 Jan 2004 01:16 UTC

  me> Ok, I don't care about this srfi anymore.

Somebody asked me offlist not to give up on this srfi and it
got me thinking that that was a pretty bitchy comment for me
to make.  Sorry.  Let me rephrase.

The thing that set me off was this comment:

    > From: Richard Kelsey <>

    > Within the context of this SRFI I don't care about asynchronous
    > execution (I do care about threads, but as long as C code is not
    > interrupted asychronously they don't matter).  The GC protection
    > mechanism described in the SRFI is inadequate if GCs can occur
    > asynchronously.  Even if variables 'a' and 'b' are protected,
    > there is no guarentee that a = b; will not temporarily store the
    > value of 'b' in some unprotected location such as a register.

which I perhaps misinterpreted to mean that there is in effect a final
decision by the authors not to change the calling conventions to
either the JNI/minor-style or Pika-style conventions that jimb and I
proposed.  It would be a mistake, in my opinion, to finalize the FFI
with the current calling conventions in place --- to the same extent
that such a finalized FFI became popular I would feel discouraged from
calling my implementation-in-progress, Pika, "Scheme".  Instead of
"Pika Scheme" I'll wind up with "Pika WhatSchemeShouldBeInsteadOfWhatItIs".

Let me try to begin to restate the issues that lead me to the
conclusion that the current calling conventions in the draft are not
acceptable.  If I can convince you of these issues, then I'll be happy
to move on to reasons why Pika-style is better than JNI/minor-style.

* Foreign Functions Extend the Root Set

  A useful way to look at the GC situation is that foreign functions
  extend the root set that the GC must honor.

  The Scheme valued parameters, locals, intermediate results and
  return values that arise during the evaluation of a foreign function
  must be GC protected for a dynamic extent equal to the "liveness" of
  those values in the C code.

  Similarly, Scheme values stored in C global or heap allocated C data
  structures are part of the root set.

  What does "part of the root set" mean?   Let's explore:

* Transient Root Values

  One possibility is that _some_ root set values are "transient" -- we
  try to design an FFI in which those values are only live during
  periods of time during which the GC is guaranteed not to run.

  The draft proposal takes this approach with respect to return values
  and parameters of FFI entry points.  For example, the draft allows
  code like:

	x = SCHEME_CONS (a, b);

  The return value of SCHEME_CONS, before it is stored in the
  (presumably GCPROtected) variable x exists only as an intermediate
  result -- it's in some register or on the stack in some random
  location or is in some random static area.  No non-conservative GC
  could find (and protecct) that intermediate result.  No conservative
  GC could protect it in a portable way.  No fully (rather than
  "mostly") copying GC could cope with that intermediate result

  Therefore, the draft proposal relies on GC being excluded between
  (roughly speaking) the execution of the `return' instruction of
  SCHEME_CONS and the completion of the `store' instruction of the
  assignment to x.

  In a single-threaded system that makes only inconsequential use of
  signal handlers, excluding GC for that period of time is guaranteed.
  There are simply no other instructions in that path that could
  trigger GC.  The intermediate value, the return value of
  SCHEME_CONS, is safe not because we've pointed it out to the GC --
  but because we're certain that the GC is quiescent during the life
  of that intermediate value.  From the GC perspective, a "transient
  root set member" like this is a bit like Schroedinger's cat: the
  right answer isn't "live" and isn't "dead" but more like "what do
  you care?  it has no causality relationship to you -- it's in
  another universe entirely.  Assume "live" _or_ "dead" -- it will
  make no difference in your observable results."

  In a multi-threaded system, the situation is not so convenient.  The
  schroedinger box leaks.  There are two possibilities: (a) Another
  thread can be running GC at _any_ time -- in this case, our
  intermediate result is _not_ GC protected, _is_ in danger of being
  collected, and the variable x may wind up holding a bogus value; (b)
  Other threads can be running GC _only_ when _all_ threads are either
  not executing a foreign function or are at a point during the
  execution of a foreign function when GC is permissable.

  The draft standard clearly requires (b) (since (a) simply means that
  the FFI is not reliable).  But (b) is really quite unacceptable
  because it means that GC activity is permitted only while foreign
  functions are not executing or are executing certain FFI calls.  A
  thread running a foreign function that tries to make a network
  connection or calls ackerman's function or whatever can block GC
  _indefinately_ using only the most innocent seeming code.  (Imagine
  your multi-threaded browser crashing or else flushing all other
  processes from RAM while one browser tab is waiting to connect to  The failure occurs because a foreign function is taking
  "too long" to complete.)

  So, for threaded systems and, equivalently, for systems that do
  interesting things from signal handlers: the current draft is no
  good.  It can require FFI implementations to block GC unreasonably.
  It can result in FFI-using C libraries that simply don't work
  well with some implementations.

  It gets worse, though.

  Let's consider just _single_threaded_ systems which _do_not_ do
  anything from signal handlers.   The current FFI invites people
  to write code like:

	foobar (SCHEME_CONS (a, b), SCHEME_CONS (c, d));

  Note that in _this_ example, the return values of the two calls to
  SCHEME_CONS are transient root set members for the _entire_ dynamic
  extent of the call to `foobar' unless `foobar' copies them to
  GCPROtected locations.   Should `foobar' do anything that might
  trigger GC, those parameters can be collected prematurely.

  Even worse, there is nothing the second-to-be-executed CONS call can
  do to protect the return value of the first-to-be-executed CONS

  Now I understand that, if we're focused only on single-threaded
  implementations that don't use signal handlers in interesting ways
  that we can just say, "Well, that code is an error.   The programmer
  made a mistake.  He should have written:"

	x = SCHEME_CONS (a, b);
	y = SCHEME_CONS (c, d);
        foobar (x, y);

  That's true but with a Pika or JNI/minor-style calling convention,
  such mistakes are _effectively_impossible_to_make_.  With the draft
  proposal conventions, such mistakes are _easy_and_inviting_to_make_.

  Aside from the discussion of transient roots, what else can we say
  about the root values added by FFI-using functions:

** Root Set Values Must be Known to the GC

  The GC must be able to enumerate all root set values.   That much
  should be pretty obvious.

  The C language presents a problem for enumeration of root values:
  it provides no reliable implicit mechanism for finding, for example,
  local variables and the intermediate results of expressions.   It's
  therefore required that the FFI involve foreign function authors
  writing code to explicitly make these values enumerable by the GC.

  The draft proposal takes the approach of providing GCPRO and GCUNRO
  entry points and in that regard is fine.

** Root Set Values Must be Mutable by the GC

  Copying collection (and most likely some other techniques as well)
  require not only that the GC be able to enumerate all root-set
  values, but also that it be able to _replace_ such values.

  This is a thorny problem for C programs.   The compiler is permitted
  in a wide and difficult to reason about set of circumstances to copy
  values to registers.   Consider, for example, a function:

        foo (scheme_value x)
          <something that can cause gc>;
          return x;

  With a copying GC, this function can return garbage.  The value of
  x, though somehow presumably GC-protected by callers, has been
  copied to a register to be passed as a parameter.  When the value
  referred to by x is moved the caller's protecting variable will be
  updated but the copy in the register will not.

  (This particular problem could, perhaps, be solved by adding a
  requirement to the FFI that callees GCPRO parameters but recall
  that we already have other reasons to change the calling conventions
  such that no such requirement is needed.

  Also, with such a requirement, the FFI would be asking people to
  write the somewhat perplexing:

        foo (scheme_value x)
          GCPRO (X);
          <something that can cause gc>;
          GCUNPRO (x);
          return x;

  which aside from just the strangeness of it, precludes a debugging
  aid in which GCUNPRO overwrites the value of its argument variable
  with a value likely to cause a trap.)

** Root Set Values Need (Potential) Read and Write Barriers

  Generational, copying, or incremental GCs sometimes operate on the
  basis of read or write barriers.  Such barriers should apply to the
  FFI-caused root-sets too.

  For that reason alone, an FFI interface that permits something like:

	scheme_value x = some_fn (....);

        SCHEME_FN (x, other_fn (...));

  (where, even if we make the corrections suggested above, `other_fn'
  might be producing a value which is _not_ a scheme value.)

  Here, x has been read without a read barrier.  Although x may refer
  to a heap-allocated value which has been relocated since x was
  assigned to, what we have produced here is (in effect) the _old_
  address of the scheme value to which x is bound.

  When control reaches SCHEME_FN, this is not obviously a problem.
  For example, when SCHEME_FN reads the value of its first parameter,
  it can itself impose a read barrier.   Although `x' itself is
  pointing to the old address -- nevertheless, SCHEME_FN can correctly
  trap that and substitute the new address.

  The problem concerns what happens _before_ control reaches SCHEME_FN
  but _after_ x has been read by the caller.   In particular, the
  execution of `other_fn' may take place, and may take an aribtrarilly
  long time.

  For that reason, the _unupdated_ (old address) value of `x' may
  persist in a register, on the stack in an anonymous location, or
  really _anywhere_ the GC doesn't know about for an _indefinate_
  amount of time.   Should `other_fn' itself trigger GC activity in a
  single threaded system, or should GC activity occur in another
  thread or in a signal handler -- we're screwed:  the GC can not
  actually reclaim the _old_ heap space used for the value referred to
  by x.  Ever.  Because the C compiler may still be hanging on to
  references constructed from that old heap address.

** Summing Up

  The draft proposal:

    ~ has an unacceptable approach to transient root set members
      in threaded systems and systems in which asynchronous activity
      takes place

    ~ has an error-encouraging approach to transient root set members
      in single threaded systems

    ~ has an unacceptable approach from the perspective of copying GCs

    ~ has an unacceptable approach from the perspective of GCs using
      read or write barriers

* Fixing It

  All of the problems with the draft proposal have a single root
  cause:  that the FFI contains features which permit and require
  scheme values to be produced as intermediate results of FFI-using
  C expressions.

  The solution is, therefore, to change the calling conventions in
  the FFI so that scheme values as intermediate results are not
  required and, indeed, are not even permitted.

  That solution implies that FFI parameters, return values, and
  expression values may not be scheme values -- it requires a
  change in the calling conventions.

  There may be other ways to implement that solution but I admit I
  only know of one reasonable way: to pass parameters and return
  values using not scheme values, but references to locations in which
  scheme values are stored.  Let's call this "the handle based

  And again, perhaps because of a failure of imagination, I see only
  two broad classes of ways to implement the handle based solution:
  One way is to let FFI-using functions manage the allocation of
  handles: the Pika approach.   The other way is to have the FFI entry
  points allocate handles but require FFI-using functions to free
  them: the JNI/minor approach.

  I can argue for the Pika approach over the JNI/minor approach -- but
  let's sync up in recognizing the problem first.