Re: Comparing Pika-syle and JNI-style

Re: Comparing Pika-syle and JNI-style Tom Lord 15 Jan 2004 01:59 UTC

    > From: Per Bothner <xxxxxx@bothner.com>

    > The problem Tom is referring to is (I assume) misidentifying a pointer
    > as a non-pointer.  That can happen if:
    > (a) You didn't tell the collector to scan the area containing the
    > pointer (most common problem).
    > (b) the pointer is "mangled", either through "clever" coding (such as
    > the xor-trick for double-linked lists) or an optimizing compiler
    > being to clever.  The former is a 'don't do that".  The latter is
    > very rare, but can conceivably happen if the compiler generates an
    > offsetted pointer while without leaving any reference to the actual
    > object start.  Boehm GC can be configured to also check "interior
    > pointers"; this reduces the problem, this hurts performance.

    > See http://www.hpl.hp.com/personal/Hans_Boehm/gc/issues.html
    > especially the "Safety" section.

It was not quite either of those.

In this case, the optimizer wasn't even being very clever -- it was
just reusing a register.

Picture a Scheme string representation like this:

    scheme_value
        |
        V
        ---------------------------------------------
        | length, tag bits, gc gbits |  (char *) o  |
        -----------------------------------------|---
                                                 |
                                                 V
                                          malloced string

(That the string is separately allocated makes little difference here.
Were it inline with the GC-controlled object the same kind of bug
would be just as likely to occur.   You could just as well picture:

    scheme_value
        |
        V
        --------------------------------------------------------
        | length, tag bits, gc gbits |  the string itself .... |
        --------------------------------------------------------

Given the scheme_value, I read the address of the malloced (or inline)
string and operate on that.  At this point the scheme_value, if I'm
not otherwise using it, is a dead variable as far as C is concerned.
Should the scheme value be collected while I'm working on the string,
the string data will be freed out from under me.  I must take
additional steps to keep the scheme_value live.

Picture (buggy) code like:

	{
           SCM scheme_string = some_init ();
           char * data = SCM_STRING_DATA (scheme_string);

           [... do stuff that can cause GC but doesn't directly
                use scheme_string ...];

           return SCM_BOOL_F;
        }

It needs to be corrected at least to something like:

	{
           SCM scheme_string = some_init ();
           char * data = SCM_STRING_DATA (scheme_string);

           [... do stuff that can cause GC but doesn't directly
                use scheme_string ...];

           scm_remember (scheme_string);

           return SCM_BOOL_F;
        }

or even (depending on the details of the object representations and
the situation with async execution or threads):

	{
           SCM scheme_string = some_init ();
           char * data;

           scm_remember_pointer (&scheme_string);

           data = SCM_STRING_DATA (scheme_string);

           [... do stuff that can cause GC but doesn't directly
                use scheme_string ...];

           scm_remember_stuff ();

           return SCM_BOOL_F;
        }

at which point I stop and ask myself "Why is it, again, that I'm not
just using precise GC instead?"

    > Tom Lord wrote:
    >
    >  > On a hunch, you review some of
    >  > the functions that you think your program is exercising to an unusual
    >  > degree and, sure enough -- find a conservative GC bug.
    >
    > What kind of "cerservative GC bug"?  Is this with the Boehm GC?  Are
    > these C functions, Scheme functions, or what?  Is it an optimizer bug?
    > --

It was not with Boehm but this and similar problems apply to Boehm as
far as I know.  The even worse problem, as far as I'm concerned, is
that conservative collectors (including Boehm) admit subtle malicious
attacks that programmers simply can not protect themselves from (more
in the direction of failing to free values rather than freeing them
early).

I don't think that Boehm himself disagrees with any of my factual
claims -- only with our subjective assessments of how serious those
are and how one should promote conservative GC as a result.  And, to
be sure -- he's got the empirical edge on me if you measure
conseravative GC for its economic value minus its econmic costs ----
so far (and so far as we know).

My bet is that it's just a matter of (within our lifetime) time before
the balance shifts in my favor due to a malicious exploit (unless
conservative techniques simply fall out of favor).  Since conservative
is ultimately no easier to use than precise: why take that bet; why
accept that risk?  Why not just eliminate the issue by barring
conservative GC from all critical systems?  That conservative is
vulnerable to malicious attack greatly skews any attempt you might
make to estimate the probability of a critical failure: really, it's a
function of the value to the attacker of that failure.

-t