Re: More comments, and the ANTLR code is too complex

Show/hide message thread

More comments, and the ANTLR code is too complex Mark H Weaver (29 May 2013 07:04 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (29 May 2013 17:39 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (31 May 2013 17:03 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (01 Jun 2013 02:27 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (10 Jun 2013 00:21 UTC)

Re: More comments, and the ANTLR code is too complex Alan Manuel Gloria (10 Jun 2013 02:01 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (12 Jun 2013 00:25 UTC)

Re: More comments, and the ANTLR code is too complex Mark H Weaver (12 Jun 2013 20:13 UTC)

Re: More comments, and the ANTLR code is too complex Mark H Weaver 12 Jun 2013 20:13 UTC

Hi David,

"David A. Wheeler" <xxxxxx@dwheeler.com> writes:
> Below is a first shot at breaking up it_expr, currently 1 long rule, into 2 rules.
> This could obviously be repeated to make more rules, each one simpler.
> Not saying it's done, but would it help to break the current longer rules
> into more but smaller rules?

No.  This doesn't help at all, because it doesn't reduce the total
complexity of the specification.  My concern is the amount of mental
effort required to understand the precise specification.

Part of the problem is that your specification is actually an
_implementation_, which is made more complex by efficiency concerns.
For example, constraining yourself to an LL(1) grammar probably rules
out a more elegant presentation.

Another big problem is the amount of redundancy in this grammar.  For
example, the pattern "scomment hspace*" is repeated in many places.
Sometimes it's a prefix wrapped in (...)*, and other times it's iterated
over by tail recursion.  The pattern "COLLECTING hspace* collecting_tail
hspace*" is also repeated in several places.  These redundancies make
more work for the reader, and make me wonder "are all these actually the
same, or are there slight differences?"

I suspect that the key to simplifying this grammar (apart from moving
away from ANTLR for purposes of the specification) is to choose a
different set of non-terminals.

Please take a look at section 7.1 of the R5RS (or the R7RS draft).
Understanding that grammar is almost effortless, and there's almost no
redundancy.  Now take a look at the specifications of SRFI-10, SRFI-30,
and SRFI-38.  All of them are expressed as a list of modifications to
the R5RS grammar.  That's the kind of thing I'd like to see in the
SRFI-110 specification.

One more nit while I'm on this subject: In the BNF conventions section,
you write "a sweet-expression reader MUST act as if it preprocessed its
input as follows", but as far as I can tell it's not actually possible
to implement this as a preprocessor.  This "preprocessing" must be
interleaved with parser, because several syntactic elements affect the
preprocessing.  For example, the <* and *> markers manipulate the
preprocessor's stack, and yet you need a full parser to recognize those
markers.  Also, if I understand correctly, indentation is only processed
outside of n-expressions.

I also think that there needs to be a much simpler sample
implementation: one which does not attempt to be fully featured
(e.g. omit support for source location tracking), and which is not a
fully self-contained reader, but is instead expressed in terms of
existing procedures which are likely already present in an SRFI-105
reader (or which could be easily created from existing code).

In other words, you should help implementors understand how to add
SRFI-110 to their existing readers with a minimal amount of code
changes.  The resulting code needs to be as simple as reasonably
possible.

Here's one possible strategy: Assume the existence of an n-expression
reader.  Now write a t-expression reader in terms of it, in the most
elegant Scheme code possible.  It turns out this is not quite possible,
but hopefully the problems can be patched up by assuming the existence
of some other helpers, and/or by adding some functionality to the
n-expression reader.

After our last email exchange, I spent some time thinking about this,
and identified a few additional things you might need:

* In order to recognize the special markers, you'll need either (1) a
  way to "unread" characters, or (2) a way for the n-expression reader
  to tell you that e.g. the symbol '<*' was "by itself" for purposes of
  SRFI-110.

* You might need a helper to read special comments without consuming the
  following datum.

I'm sorry that I cannot be more complete in my analysis of what needs to
be done, but my time (and motivation) is limited.  Reformulating this
code will be a lot of work, but I suspect that adoption will be very low
unless you can show implementors how to add SRFI-110 easily and with a
small amount of code.

     Regards,
       Mark