single vs. multi-sexp modules

Show/hide message thread

single vs. multi-sexp modules Alex Shinn (13 Jan 2006 08:25 UTC)

Re: single vs. multi-sexp modules Per Bothner (14 Jan 2006 03:01 UTC)

Re: single vs. multi-sexp modules bear (15 Jan 2006 17:25 UTC)

Re: single vs. multi-sexp modules Alex Shinn (16 Jan 2006 02:05 UTC)

Re: single vs. multi-sexp modules Jim Blandy (16 Jan 2006 06:13 UTC)

Re: single vs. multi-sexp modules Tony Garnock-Jones (16 Jan 2006 11:45 UTC)

Re: single vs. multi-sexp modules Alex Shinn (20 Jan 2006 03:08 UTC)

single vs. multi-sexp modules Alex Shinn 13 Jan 2006 08:25 UTC

There seems to be a lot of debate between modules wrapped in a single
form (as in the current proposal) and modules with a top-level module
declaration followed by multiple sexps considered part of the module
(as in Per Bothner's earlier suggestion).  I'll call these single and
multi-sexp modules, respectively.  This is a very important decision
(one for which I have not yet made up my mind), which has died off on
the list so I thought I'd summarize and clarify the issues in a
hopefully objective manner.

[Disclaimer: I regularly use Schemes with both styles of module system.]

Files.  This is a big sticking point with people, and there are many
in the community who would like to do away with files altogether.  It
also has nothing to do with the issue at hand.  At some point source
code gets serialized to a stream of bytes, delimited from other
streams of bytes, and this stream can just as easily contain a single
sexp as many.  And that's what we need to decide, do we have one or
allow many sexps?

Programmers have certain preferences as to whether a module should be
in one such stream, split across multiple streams, or even if we
should allow multiple modules in a single stream.  All of these are
solved problems for both cases, and there are implementations which
allow all of these for both cases.  I had a really long diatribe
explaining that this is so but have deleted it in the interest of
brevity.  Feel free to flame^H^H^H^H^Hcontact me off-list if not
convinced.

Ease of transition.  Implementations may want to essentially translate
R6RS library forms into their internal module system.  A macro could
in many cases convert a single top-level sexp into a native module
declaration, but in general converting multiple top-level forms into a
single native module form would require core changes.  In other words,
with the multi-sexp style you can't hack this yourself, and have to
wait for the implementers to update the language.

Ease of implementation.  Module systems can easily and naturally be
implemented with macros and lexical scope.  This is in the spirit of
Chez's module system, and this principle has enabled Andre van Tonder
to already implement the current proposal portably (thanks!).  A
multi-form system cannot be so prototyped, and requires modifying the
core language.

Reader extensions.  Given multiple sexps parsed in succession, any one
of them could alter or replace the parser for the following sexps.
This would allow you to have declaration changing to a C-like syntax,
for instance.  You may not want this, but some people will, and the
single sexp precludes this possibility.  You could allow a syntax
declaration _before_ the library form, but that gets us back into
multi-sexp land.

REPL interaction.  This may not be a concern to you in particular (you
may not even have a REPL), but it's very important to some people.
Given the multi-sexp style it's trivial for many interpreters to
support declaring, switching between, and evaluating inside of
modules.  This can also be a big help to new users trying to
understand module semantics as well as importing/shadowing rules.
It's much more difficult for the single-sexp style to support this
kind of exploration, at least requiring the user learn a different way
to import modules at "top-level", and requiring a separate command to
change the current interaction module if you want to support that.

Indentation.  Horizontal space is limited, and indenting something
they consider at the "top" of the module just bugs some people.  I
personally advocate *not* indenting the first level inside a module
form.  Basically, write the file as if it were a single module
declaration followed by multiple top-level forms, remove the closing )
from the initial declaration, and move it to the very end of the file
on a line by itself.  If the first form inside the module is at column
zero, Emacs will indent all those after it to column zero.  Problem
solved.

Tool support.  Some tools may run into problems with huge sexps.
Others may be much faster with a single declaration at the top of a
file (for example if you want to quickly scan library declarations
from many files).  How important is this, and how quickly will the
tools get fixed?

... and many more.  Additions and rebuttals welcome.

--
Alex