Re: fundamental design issues (long message) bear 03 Dec 2005 02:21 UTC

I don't feel qualified to tackle this problem in full
detail; My thanks to those who are.

But I want to point out some simple things that work
in other languages, and point out that this model
does not support them.

In static languages such as C and Pascal, a forward
declaration is often used to state particular properties
about a function or procedure, while leaving the body
of the definition for later (presumably after the
definitions of some of the things it depends on).

In both languages, a procedure or function may have any
number of forward declarations (or the same forward
declaration may be imported multiple times) but only
one definition.

In C in particular, this turned out to facilitate a
very useful pattern.  A "header file" with nothing but
forward declarations allows most interdependency problems
to be solved statically, because the forward declarations
facilitate the separated compilation of many different
units. The effect is that with the forward declarations
handy, the compiler knows how to compile calls to particular
functions, even when the definitions of those functions are
not visible.  This is very important.

I want to be sure that a module system for scheme should
be *at least* that useful in resolving dependencies and
supporting separated compilation.

So I look at the "imports" syntax, and I am thinking,
"Here is a forward declaration.  Does it contain enough
information to allow a compiler to create an optimized
call to the associated function without being able to
see the function definition?"  And the answer is a resounding
no.  It contains only namespace control information.

So I look at the "exports" syntax, and I am thinking,
"okay, here is more namespace control information, but it
is also a specification that, together with the file of
source code, may be used to automatically generate
(implicit) forward declarations usable in the compilation
of other files."

And that will work, and it's less error-prone, but it means
that there can be no truly separate compilation; you can't
get the needed forward declarations without looking at the
source of modules other than the one you're trying to compile.

I realize that you're concerned more about namespace
control and name reuse, but I'm more concerned with an
efficient compilation model where a program running into
millions of lines can be distributed over a network to
a "compile farm" of machines that don't have to *all* be
looking at *all* the code in order to compile their
particular modules, and I don't feel that this need is
addressed (or even considered) by the current proposal.

As an implementor, I could "correct" it by explicitly
creating a forward-declarations resource for each library
and then having a build process that explicitly copies
and distributes those resources along with each source
text that needs to be able to see them in order to
compile successfully - but this makes compilation into
a three-stage process, where you must first distribute
the libraries and have your compile farm generate the
forward-declaration resources, *then* get them distributed
everywhere they're needed for the compilation of other
modules, and *then* go ahead with actual compilation.

Now, for the luxury of not having to actually write the
forward declarations and manually keep them in sync with
the source code, that might be worth it.  But there's a
deeper issue that messes up the compile-farm model; how
do I detect when the autogenerated forward-declaration
resources have become stale?  Since they're autogenerated
from source code, a simple dependency model indicates that
they become stale when the associated source code changes.
But that means that *EVERY* change in the source for a
module results in a new forward-declarations resource for
that module, and that triggers a recompile of *EVERY*
module that calls anything in the changed module. The
development of large projects thrashes to a halt as all
the developers run screaming.

So, let's say we go to a more complex dependency model,
where *EVERY* definition in every module has its own
forward-declaration resource, and new ones are generated
only when a particular definition has changed, rather than
a single resource being for everything in a particular
module.  And the problem with that is, it's functionally
equivalent to having every line of source distributed to
every machine in the compile farm, so we're back to square
one but with worse complexity.

It is my hope and desire that, with a module system, Scheme
should become useful for projects running into millions of
lines (such as, for example, an OS kernel).  But with the
proposed module system, utility for such projects would
require significant further advances in the state of the art.

					Bear