Re: when GC is permitted Tom Lord 15 Jan 2004 17:59 UTC


    > From: Eric Knauel <xxxxxx@informatik.uni-tuebingen.de>

    > On Thu 15 Jan 2004 00:16, Tom Lord <xxxxxx@emf.net> writes:

    >> I sampled some of the C code in a version of SCSH that I have on hand
    >> (0.5.2 -- sorry, a download for a more recent version was taking _way_
    >> too long so I'll risk being embarassed that everything has changed
    >> since then).

    > Actually, that changed completely in the 0.6-series, it's almost
    > exactly the FFI Scheme48 is using.  That's why migration of the
    > existing bindings for scsh *0.6-series* and Scheme48 is easy.

Ok, then -- the motivation of the authors is now somewhat clearer to
me.  I have 0.6.5 now.  I'm looking at the revamped posix_regexp_match
(in regex1.c) and notice that:

~ it doesn't GC protect it's parameters as required by the SRFI

  (BTW: this appears to _not_ be a bug in the context of s48 because
  of assumptions the code makes about what can and can't cause
  collection.  However, a simplistic conversion of this code to the
  analogous draft-FFI functions would, indeed, have a bug in this
  regard.)

~ it assumes that STRING_LENGTH returns an integer (SRFI says long)

~ it uses s48_raise_range_error which the SRFI doesn't provide

~ it contains the code:
	s48_raise_range_error (sch_start,
                               s48_enter_fixnum (0),
                               s48_enter_fixnum (len))

  There is no _enter_fixnum in the draft and, properly,
  there is no number-constructing function in the draft
  which is not in the "(may GC)" category.   Yet that code
  is not GC-safe if s48_enter_fixnum is replaced by a
  possibly GC-causing function.

In syscalls1.c:

~ an instance of comparing to S48_FALSE using !=, an instance
  of comparing to S48_TRUE using !=, and two instances of comparing
  to S48_TRUE using ==

~ general assumption that s48_extract_string is not in
  the the "may GC" class

  Of course the draft agrees with that but I point it out here
  to emphasize that the draft is fragile in this sense.  If the
  primary motivation is to be able to publish a few 10K LOC from
  SCSH under a SRFI FFI then either the draft _can_not_ change
  extract_string to "may GC" or all of that code must be reviewed
  and fixed.

~ more use of error signalling functions not provided by the draft

~ this code which is incorrect under the current interpretation
  of the draft (because it is incompatible with copy collection):

	s48_cons (sch_result_cutime,
                  s48_cons (sch_result_cstime, S48_NULL))

    > >     > - most of scsh
    > >     > - bindings for ODBC (also for scsh)
    > >     > - bindings for NIS and LDAP (also for scsh)

    > > I'd appreciate it if you could say more about this: quantities of
    > > code, filenames and distributions containing them, and what you think
    > > the effort of migration from native-scsh to draft-ffi would involve.

Thank you for replying, to that, btw.

    > The scsh CVS repository at sourceforge.net contains ODBC and LDAP
    > bindings in the modules scsh-ldap[1] and the directory
    > scsh/scsh/odbc[2].  The LDAP bindings are almost complete and about
    > 1200 LOC C-code and 1100 LOC Scheme-code (about 300 LOC automatically
    > generated).  The ODBC bindings consist of about 3000 LOC C-code
    > (partially tricky) and about 2000 LOC Scheme-code.

    > Currently, I'm busy cleaning up the ODBC bindings and changing them to
    > use the SRFI 34/35 exception system.  Building the c-stub as a shared
    > module that can be dlopen()'ed by scsh and Scheme48 is also on my
    > list.

    > I'm very confident that migrating those bindings to the SRFI-FFI is
    > not much work.  Checking whether the GC annotations are (still)
    > correct and a few search/replace-operations should be enough.

(1200 - 300) + 3000 * trickiness_bonus ~= 7000

I'm confident too that migrating s48 bindings to the draft is not, in
some sense, much work.  That isn't my point.

I have two points, actually:

1) The kinds of bugs I found in syscalls1.c and regexp1.c are
   a big deal in at least three respects:

  a) They suggest that to the degree rapidly releasing this code under
     the draft FFI is a priority for the authors, the draft is
     constrained _by_this_code_ to not change in what would otherwise
     be some fairly minor ways.  (For example, that _extract_string
     might GC.)   In other words, the degree of value the authors
     place on getting this particular code out easily is the same
     degree they face a conflict of interest when it comes to modifying
     the draft.

  b) These bugs include some that _will_ be bugs under the draft FFI
     such as the pervasive assumption that enter_fixnum can not GC
     and the occaisional vestige of C == and != comparisons to
     certain "constants".   The nested calls to s48_cons are another
     example.

  c) The style of the code in posix_regexp_match -- in particular that
     it is written with very strong assumptions (stronger than the
     draft's in fact) about when GC can occur -- suggests to me that
     (i) the proposed FFI is fairly hard to use and (ii) it's very
     fragile and constraining of implementors.   The trickiness that
     (in s48, not in the draft) permits parameters to go unprotected
     in posix_regexp_match is an example of why the proposed interface
     is hard to use well.   That this same code becomes wrong under
     the fairly minor differences between the s48 ffi and the draft
     illustrates how fragile the draft is.

2) I don't mean to diminish the work that has gone into this stuff but
   we seem to be talking about, what, 20K LOC all told?

   That's 20K LOC that, to be correct under the draft, will have to be
   reviewed for the kinds of errors I found in syscalls1.c and
   regexp1.c.

   Meanwhile -- what happens if (a) the draft is finalized;  (b) a
   bunch of implementors provide it;  (c) by hook or by crook a
   certain amount of the SCSH code winds up being widely used.

   Then we have a superficially credible Scheme FFI contradicted only
   by the discussions on this list.   Will it then be considered a
   success if a few months later instead of 20K LOC depending on it
   we have, scattered in various projects, 200K LOC depending on it?

-t