Email list hosting service & mailing list manager

More JNI vs. Pika comparison Jim Blandy (17 Feb 2004 23:22 UTC)
Re: More JNI vs. Pika comparison Matthew Dempsky (18 Feb 2004 08:09 UTC)
Re: More JNI vs. Pika comparison Jim Blandy (18 Feb 2004 08:39 UTC)
Re: More JNI vs. Pika comparison Matthew Dempsky (18 Feb 2004 15:46 UTC)
Re: More JNI vs. Pika comparison Tom Lord (20 Feb 2004 17:32 UTC)
Re: More JNI vs. Pika comparison Jim Blandy (24 Feb 2004 22:13 UTC)
Re: More JNI vs. Pika comparison Tom Lord (24 Feb 2004 22:32 UTC)
Re: More JNI vs. Pika comparison Jim Blandy (24 Feb 2004 23:23 UTC)
Re: More JNI vs. Pika comparison Tom Lord (25 Feb 2004 00:16 UTC)

Re: More JNI vs. Pika comparison Tom Lord 20 Feb 2004 17:51 UTC


    > From: Jim Blandy <xxxxxx@redhat.com>

    > I've come across a situation which is reasonably straightforward to
    > handle in a JNI-style interface, but which I think requires machinery
    > I haven't seen yet in Pika-style.  I'd like to see the Pika folks'
    > solution here.

    > When I write actions for a Bison grammar, it's pretty straightforward
    > to use Minor references in the semantic actions.  [....]
    > [...code...]

    > The underlying issue here is that we want the generated parser's stack
    > of semantic values to hold references to Scheme objects.  Since a
    > JNI-style interface works on pointers to dynamically allocated
    > references, one can simply declare semantic values to be such
    > pointers, as we do with the #definition of YYSTYPE above.  We use
    > "linear" functions like mn_to_cons to free intermediate references.
    > If the parser exits with an error, any local references on the stack
    > are freed properly when the mn_call returns.

    > It seems to me that handling this in Pika requires one to use a
    > separate facility that hasn't been described on this list before,
    > which allows Pika references to appear inside other data
    > structures,

That's right.  This is a specific instance of the general need for a
facility to create locations in C whose lifetime exceeds the dynamic
extent of the block of code that creates them.

I mentioned that omission back when I first described the Pika
conventions on this list.

    > It seems to me similar problems will occur working with any
    > third-party tool that presumes it is sufficient to let people pass
    > around pointers to data of their own definition.

    > So, in the end, it looks to me as if Pika will need to provide a
    > JNI-style interface anyway, in addition to the C compound-statement-
    > bound interface, which would still be the preferred interface for C
    > code written against Pika interfaces.

I think that has to be read as "JNI-style" in only the broadest sense
of the term -- a need for an interface to create locations whose
lifetime is explicitly managed.  Narrower "JNI-style" features that
are _not_ necessary include:

~ reference counting for locations
~ "linear" functions
~ attachment of locations to a "call" structure whose lifetime
  trumps the reference count of attached locations

My personal opinion is that simple reference counting of explicitly
managed locations may, in fact, be desirable -- but it isn't strictly
necessary.  Separate linear functions aren't needed at all nor is the
"call" structure approach to location lifetime mgt.

A strawman of what such an interface in Pika might look like would be:

	t_scm_word * scm_allocate_location (t_scm_arena arena);
           Allocate a new location with a reference count of 1.

	scm_location_ref (t_scm_arena arena, t_scm_word * loc);
	scm_location_unref (t_scm_arena arena, t_scm_word * loc);

What about your parser example?  You exhibit code like this:

  /* The type of Bison semantic values.  */
  #define YYSTYPE mn_ref *

  [....]

  list: '(' list_data ')' { $$ = $2 };

  list_data:
      datum list_data { $$ = mn_to_cons (c, $1, $2); }
    | datum '.' datum { $$ = mn_to_cons (c, $1, $3); }
    |                 { $$ = mn_null (c); }
    ;

It's worth noting first that that's pretty fragile code in two ways:

First, actions such as the one in:

      datum list_data { $$ = mn_to_cons (c, $1, $2); }

are destructive of $1.   A simple modification to:

      datum list_data {
                        $$ = mn_to_cons (c, $1, $2);
                        log_obj_added_to_list (c, $1);
                      }

with the intention of logging the list element, not the new list spine
pair, is incorrect.  The use of "linear" operations here is a by-hand
optimization -- but one that has a price in terms of code simplicity.
I think that this is a general weakness of explicit linear updates --
in examples where the linear update is further (textually) separated
from the errant subsequent use, and in situations where whether or not
a linear update has taken place depends on the control path through
the code, it can become quite an exercise to figure out which value a
C variable actually refers to.  It seems to me to be a programming
practice that raises the question of why we bother having compilers
that work hard at register assignment.

Second, all intermediate values constructed in this parse but _not_
stored in a reference that will be destructively updated (such as $2
in the action above) are GC protected for the lifetime of the parse.
To have a parse that protected a number of locations bound by the
depth of the value stack, one would need to write something like:

      datum list_data {
                        $$ = mn_to_cons (c, $1, $2);
                        mn_unref (c, $2);
                      }

In other words, on two counts at least, the enticing simplicity
of the exhibited code is at least a little bit misleading.

I suppose that the brute force Pika solution would look something
like:

      datum list_data {
                        $$ = scm_allocate_location (instance);
                        scm_cons ($$, instance, $1, $2);
                        scm_location_unref (instance, $1);
                        scm_location_unref (instance, $2);
                      }

which, although four times as verbose as your original code (twice as
verbose as the more robust form of your code), is not fragile wrt to
"linear" operations and is accurate wrt to GC.

If you really wanted the linear optimization, you'd instead get:

      datum list_data {
                        $$ = $1;
                        scm_cons ($$, instance, $1, $2);
                        scm_location_unref (instance, $2);
                      }

Either way, you wind up with code that looks about the same as what
you would get writing a (non-Scheme) traditional parser producing
reference-counted tree structures.

(There may also be more elegant solutions to the specific problem of
writing a parser using Bison -- I haven't thought about it.)

-t