Re: when GC is permitted

Re: when GC is permitted Tom Lord 08 Jan 2004 21:58 UTC

    Jim> Here's the implementation of mn_car, with run-time
    Jim> typechecking [.... example that uses mn__make_local_ref ....]
    Jim> mn__make_local_ref is an allocation.

    Tom> Pika-style doesn't have that problem.

    Richard> Is this because the reference cell would be passed in by
    Richard> the caller?  Would it be fair to say that Pika-style is
    Richard> (in part) JNI-style with (some) stack-allocated reference
    Richard> cells?

That's a pretty good way to look at it.

    Richard> I am still confused about it.  Perhaps you could give the
    Richard> mn_car() example in Pika.  Or mn_cadr() might be more
    Richard> elucidating.

Let me try to explain it a bit and give the example you've asked for.

I won't (for now) try to address _global_, C-allocated locations for
holding Scheme values here -- just locals, parameters, and return
values.   We can take up globals later.

I'll also explain this in terms of a specific use of CPP macros and C
constructs -- though it should be obvious that there's many possible
variations which differ essentially only in syntax.

In Pika-style, the C type of a C-allocated location which can hold a
Scheme value is defined as:

	typedef <unspecified> t_scm_word;

(The use of the string ``word'' there refers to a conceptual, virtual
Scheme machine -- it has no necessary relationship to machine words.)

Strictly speaking, if C declares a variable of type `t_scm_word' then
there is nothing in the Pika conventions that requires that,
literally, the storage for that C variable is identical with the
storage corresponding to the Scheme location.  The lifetime of the
t_scm_word is explicitly managed by code which implements the FFI.
Therefore, the t_scm_word lvalue _may_ contain not the Scheme value
directly, but instead some form of reference to the actual Scheme
location.  For example, if one had a Scheme implementation that used a
JNI-style FFI, and wanted to implement a Pika-style FFI on top of
that, then the t_scm_word values might just contain mn_ref values --
pointers to some _other_ location in the C store where the scheme
value is stored.  (In practice, I would normally expect the t_scm_word
lvalue to contain the Scheme value directly.  I'm only noting that it
_could_ be held indirectly.)

If a C block needs local (aka automatic) variables to hold scheme
values, it has the form:

	{
          struct this_blocks_frame
          {
            SCM_FRAME;

            t_scm_word x;
            t_scm_word y;
            [...];
           } l;

          [....];

          SCM_PROTECT_FRAME (l);

          [....];

          SCM_UNPROTECT_FRAME (l);
        }

The declared structure type must have the illustrated form: a use of
the SCM_FRAME macro where the first field would be declared, followed
by t_scm_word declarations for each local variable.

Conceptually, you can think of that structure declaration and the
local variable `l' as allocating, on the C stack, some N locations to
hold Scheme values.  N is the number of local variables and a useful
upper bound on N can be inferrerd at compile-time from `sizeof (l)'.
This concept is, I think, the same thing you are thinking when you
wrote: "Would it be fair to say that Pika-style is (in part) JNI-style
with (some) stack-allocated reference cells?."   Yes: this
struct-declaring convention is essentually allocating some Scheme
locations on the C stack.

SCM_PROTECT_FRAME has the job of informing the Scheme implementation
that these new locations have been added to the Scheme heap.
SCM_UNPROTECT_FRAME has the complementary duty.  Between those two
calls, the Scheme locations allocated for these local variables are
just like the CAR and CDR slots of a pair -- semantically, they are
just additional, perfectly ordinary, Scheme locations.

Primitive operations available within that C block are semantically
defined (in part) in terms of their side-effects upon such locations.
Consequently, their parameters are not Scheme values -- but
indications of which locations to take values from.   Their return
mechanism is not the C return mechanism -- but instead an indication
of in which Scheme locations to store results.   Thus, instead of:

	x = SCHEME_CAR (y);

one writes:

	scm_car (&l.x, instance, &l.y);

(It turns out to be useful to use the C return value of scm_car for
other purposes, such as error indications.)

Really there is a double indirection there.  `l.y' is the C name of a
Scheme location.  `&l.y' is the name of a C location where a reference
to a Scheme location is stored.  But the particular Scheme location
named by l.y is unique (no other C variable names it) and coextensive
with the C lifetime of l.y itself.  `l.y' will only ever refer to
exactly one Scheme location.  Indeed, the expectation (but not the
requirements) is that l.y is _itself_ the location named by l.y.  By
passing `&l.y' instead of `l.y' we avoid overspecifying what the
typedef of t_scm_word is, and permit the (expected case) possibility
that l.y itself is the Scheme location.

What would CADR look like in Pika?  I'm not sure it is really an
elucidating example but here you go.

Blowing off error handling from C (presuming that, if at all, it is
handled by non-local exits) -- you'd get:

        void
	my_cadr (t_scm_word * answer,
                 t_scm_instance instance,
                 t_scm_word * pair)
        {
          scm_cdr (answer, instance, pair);
          scm_car (answer, instance, answer);
        }

I would rather write it like this:

        t_scm_error
	my_cadr (t_scm_word * answer,
                 t_scm_instance instance,
                 t_scm_word * pair)
        {
          t_scm_error err = 0;

          err = scm_cdr (answer, instance, pair);
          if (!err)
            {
              err = scm_car (answer, instance, answer);
            }
          if (err)
            {
              scm_make_wrong (answer, instance);
            }
          return err;
        }

The `scm_make_wrong' call is a convention I use in Pika to set a
return value to a value which is always an error to pass to some other
primitive.  It is essentially a debugging aid and a portable FFI would
not suffer terribly to omit it, leaving only:

        t_scm_error
	my_cadr (t_scm_word * answer,
                 t_scm_instance instance,
                 t_scm_word * pair)
        {
          t_scm_error err = 0;

          err = scm_cdr (answer, instance, pair);
          if (!err)
            {
              err = scm_car (answer, instance, answer);
            }
          return err;
        }

    > Also, when running in a system with a precise collector and
    > interrupt-anytime threads, Pika-style would require the elided
    > mn__[begin|end]_incoherent() critical section calls?

No.

There is an independent and wholly unrelated reason to want those for
different examples, but they are not required for any of what we are
discussing.

-t