Re: Comparing Pika-syle and JNI-style

Re: Comparing Pika-syle and JNI-style Jim Blandy 14 Jan 2004 21:22 UTC
Tom Lord <xxxxxx@emf.net> writes:

>     > From: Jim Blandy <xxxxxx@redhat.com>
>
>     > Tom Lord <xxxxxx@emf.net> writes:
>     > >     > [cadr isn't very interesting, imho -- cadr example snipped]
>
>     > Could you humor me, and post the code anyway?  Of course, feel free to
>     > pose other problems.
>
> http://srfi.schemers.org/srfi-50/mail-archive/msg00241.html

See, that is interesting, though --- it shows that you don't have to
always set up frames in Pika, if you can do all your computation in
references owned by the calling frame.

> Again, it comes down to the three classes of functions, (a), (b), and
> (c).  My proposal allows people to write FFI-using code in any of
> those three classes.  If the Pika FFI includes a standard interface to
> that auxiliary stack, then people writing FFI-using code in class (c)
> get full interoperability with one another.  But in the far more
> common case (how many libraries do you know that permit longjmping
> past them?), users writing FFI-using code in class (b) don't have to
> pay for auxiliary stacks.

But I don't even want to have to think about whether I'm going to be
longjmped past.  It's a non-local property, involving code I probably
haven't even read, and certainly can't afford to audit each time it's
revised.  I'm willing to tolerate less-than-optimal behavior when a
longjmp occurs, as long as it's still correct, in exchange for not
having to think about it.  And I've suggested a way to fix the
problem: have SCM_PROTECT malloc frames, and SCM_UNPROTECT free them.

(People do get yelled at around here for not accepting workarounds
that have minor performance impacts, you know.  :)  Not that you did
that yourself.)

>     > That's right.  References must be explicitly freed; JNI can help you
>     > out in some cases, but you have to think about it.
>
>     > I think that JNI code will often be "linear", in the SRFI-1 sense,
>     > with functions like 'f' that accept references being documented to
>     > free them.  The "mn_to_car" and "mn_to_cdr" functions are linear
>     > variants of "mn_car" and "mn_cdr"; we can add more of these as we find
>     > them useful.
>
> Oh dear.   That's the thing: you're winding up not having a common
> case of linear functions but instead, having a common case of wanting
> two entry points for every function (one linear, one not).   And with
> multiple parameters:  should it be linear in all of them?  or just
> some?   Sounds like quite a mess.

To be fully general, you'd need 2^n variants of each function that
accepts n references.  This is the explicit-free hair.  At the moment,
I only have a few "to" functions, where I could think of important
idioms to support.

>     > The nice thing about functions that handle references in a linear way
>     > is that they are actually faster than ordinary functions: since you're
>     > about to free the reference, you know it's not shared amongst any
>     > threads, so you can reuse it without memory synchronization.
>
>     > Thus the
>     > implementation of mn_to_car:
>
>     >     /* Officially, the following functions deallocate one of the
>     >        references they're passed (call it REF), and return a new
>     >        reference.  But in fact, they just set REF->obj, and return REF
>     >        as the new reference.
>
>     >        This can be done without synchronization, even if REF is a
>     >        global reference, because:
>     >        - if anyone ever refers to REF assuming the old value, there
>     >          must be a race condition, because it's about to be freed, and
>     >        - nobody should refer to REF expecting the new value, unless
>     >          they received it in some properly-synchronized way, because
>     >          it's supposed to be an entirely new reference.  */
>
>     >     mn_ref *
>     >     mn_to_car (mn_call *call, mn_ref *ref)
>     >     {
>     >       mn__begin_incoherent (call);
>     >       {
>     >         ref->obj = check_pair (ref)->car;
>     >       }
>     >       mn__end_incoherent (call);
>     >
>     >       return ref;
>     >     }
>
> Isn't that code incorrect in a threaded system?   While `ref' is,
> indeed, about to be freed, the pair that it refers to is live.
> Assuming that the `incoherent' calls exclude only GC but not other
> mutators (which is the benefit you seem to be claiming), then the
> `->car' risks producing garbage.

This is what that comment is going on about.  References are
immutable: there is no operation that changes a reference's referent.
mn_to_car looks like a counter-example, but it isn't: officially, it
frees REF, so it would be incorrect to call it if any other thread
were referring to it.  But since it's freeing a reference and then
immediately allocating a new one, it might as well just reuse the
reference.

>     >    NOTE: Many of the functions in this interface will typically be
>     >    used in contexts where the caller "knows" that no error will occur.
>     >    Having to check each call to these functions for an exception
>     >    return value is a burden; people probably wouldn't do it, and
>     >    people's experiences with this interface would be unpleasant.
>
> That's what "(void)" is for? :-)

No, that doesn't address the issue at all.  I guess the comment isn't
as clear as I had thought it was.  I tried to rephrase it, but ended
up writing the same thing...  Hmm.

>     > > 	err = g (&frame.answer, instance, &frame.x);
>     > >         if (err)
>     > >           {
>     > >             ....;
>     > >           }
>     > >         err = f (&frame.answer, instance, &frame.answer);
>     > >         if (err)
>     > >           {
>     > >             ...;
>     > >           }
>
>     > Right: now imagine that g and f are 'car' and 'cdr'.  What should be
>     > 'mn_to_cdr (c, mn_car (c, x))' has become eight lines of code.
>
> Apples and oranges.
>
> The Pika-style equivalent to your code fragment is 2 lines, not 8.
>
> 	scm_safely (g (&frame.answer, instance, &frame.x));
> 	scm_safely (f (&frame.answer, instance, &frame.answer));

Oh, I agree --- I didn't mean to compare Pika vs. JNI there.  I meant
to compare check-for-error-codes vs. handle-error-internally.

>     > >     > - Variables are declared normally, and their values used directly.
>
>     > > Variables are declared normally in Pika, too.  I think you mean that
>     > > JNI-style attempts to disguise handles as Scheme values.  It is
>     > > because it can't pull off that illusion perfectly that I think it is a
>     > > questionable choice.
>
>     > There's an illusion at work in Pika, JNI, and SRFI-50, and it oozes
>     > out and reveals itself in all three systems.  (SRFI-50's ooze is that
>     > it limits when GC can happen.)  What I'm asking is which people
>     > consider the least of three oozes.
>
> And I'm not criticizing your for asking.   I admit: Pika-style code is
> the ugliest of the lot;  sometimes the most verbose.  I'm just making
> the case that that's for very good reasons.
>
> By the way: what is the "illusion" oozing out of Pika-style?  I don't
> see it but perhaps I'm just too close to it.

Well, if you'll grant that this is fuzzy talk:

Pika's ooze is that you appear to be operating on local variables, but
you can only use them as lvalues, never rvalues.  And they're actually
data structures owned by the GC; rather than being managed by the
compiler, as local variables are, they have to be explicitly
registered and unregistered.  That's the source of the longjmp
problems, too.

If you malloc your frames, as I suggested, then your ooze is that you
can't use local variables.

A "lump in the carpet" metaphor would be better than ooze, I guess.

But anyway, all this finagling is how I ended up at the explicit-free
stuff.  The only way to really have a smooth carpet is to change C to
be precise-GC-friendly.  But we can't do that.  So the best you can do
is to put the lump in the carpet somewhere where people expect it, and
are used to stepping over it.