Comparing Pika-syle and JNI-style Jim Blandy 14 Jan 2004 09:06 UTC

Jim Blandy <xxxxxx@redhat.com> writes:
> Well, if SRFI-50 turned out not to be what I was hoping, and I didn't
> come to my senses quickly enough, I was going to turn <minor/minor.h>,
> into a .texi file, start a SRFI from that, and see what people said.

In light of that, I'm curious to know how people generally feel about
the Pika vs. JNI issue.  If there's a near consensus on one or the
other, then that could save a lot of trouble.

Here are links to the specs:
Pika:  http://arch.quackerhead.com/~lord/releases/pika
       http://regexps.srparish.net/src/pika/
Minor: http://svn.red-bean.com/repos/minor/trunk/include/minor/minor.h

Here's how I see it:

Commonalities:
- Both work by having C code manipulate only references to Scheme
  values, not Scheme values themselves.
- Both impose few restrictions on the representation of Scheme objects.
- Both allow GC to occur at any time.
- Both can be implemented in a way that interacts nicely with threads.

In Pika:
- Leaks are impossible, since references are stack-allocated.
- References are freed upon exit from the lexical block that owns
  them --- finer-grained than JNI-style.
- Probably less overhead than JNI-style.

But:
- Forgetting an UNGCPRO corrupts the GC's data structures, and may
  fail only intermittently.  Irregular exits (return; goto; break;
  continue) require attention.  longjmp is even harder.
- Functions may only return Scheme values by reference; they may not
  provide them as their (syntactic) return values.  Instead of writing
  "f (g (x))", you must write:

    g (&frame.x, &frame.temp);
    f (&frame.temp, &frame.temp2);

  In other words, you must write your code as linear series of
  operations which work by side-effects.
- Since the API functions all expect pointers to t_scm_word values,
  this discourages people from passing them around directly, but it
  can still be done --- e.g. "frame.x = frame.y;" --- and doing so
  will usually work.  But doing so is a bug.
- Variable declarations are cluttered with enclosing structs and GCPRO
  / UNGCPRO calls.

In JNI-style:
- Functions can return references directly, so code need not be
  linearized.  You can write "f (call, g (call, x))" --- if you know
  that "call" will return and free g's return value soon enough.
- Local references are freed automatically when the Scheme->C call to
  which they belong returns.  Leaks due to unfreed local references
  (which will probably be the most common sort of error) have a
  bounded and often (though not always) short lifetime.
- No GC data structures live on the C stack, so careless control flow
  and longjmps will not corrupt the GC's data structures.
- The "explicit free" model is familiar to C programmers.
- Variables are declared normally, and their values used directly.
- Since mn_ref is an incomplete type, it can't be dereferenced, so
  people can't be sloppy and operate on the heap values directly.

But:
- The "explicit free" model is still error-prone.  The fact that leaks
  are bounded by their owning call's lifetime may not always help.
- Probably more overhead than Pika-style.
- Code will be cluttered with explicit-free crap.

Is this fair?  What have I missed?  What do people think?

It would be nice to see sample code in each style.  C implementations
of "cadr" and "assq" would be nice.  As far as I know, error checking
is similar under both interfaces, so that can be left out.

    mn_ref *
    cadr (mn_call *c, mn_ref *obj)
    {
      return mn_car (c, mn_cdr (c, obj));
    }

    mn_ref *
    assq (mn_call *c, mn_ref *key, mn_ref *alist)
    {
      while (mn_pair_p (c, alist))
        {
          mn_ref *pair = mn_car (c, alist);
          mn_ref *pair_key = mn_car (c, pair);

          if (mn_ref_eq (c, key, pair_key))
            return pair;

          mn_free_local_ref (c, pair);
          mn_free_local_ref (c, pair_key);
          alist = mn_to_cdr (c, alist);
        }

      return mn_false (c);
    }