Re: More comments, and the ANTLR code is too complex

Re: More comments, and the ANTLR code is too complex Mark H Weaver 13 Jun 2013 18:32 UTC
"David A. Wheeler" <xxxxxx@dwheeler.com> writes:

> Mark H Weaver <xxxxxx@netris.org> wrote:
>> For example, constraining yourself to an LL(1) grammar probably rules
>> out a more elegant presentation.
>
> Hmm, I've always heard complaints about grammars that are NOT LL(1).
> The obvious way to make sure it's LL(1)... is to specify it as LL(1).

I agree that the language should be _expressible_ as an LL(1) grammar.
However, often a grammar must be obfuscated in order to make it LL(1).

A classic example is:

   E -> E + T
   E -> T

which is not LL(1), but is equivalent to the following LL(1) grammar:

   E -> T Z
   Z -> + T Z
   Z -> <epsilon>

In the specification, I suggest that you optimize ease of understanding
over efficiency.  I hope we can all agree which of the above grammars is
easier to understand.  IMO, the LL(1) grammar belongs somewhere else,
perhaps in the implementation section.

Other examples of transformations needed to make a grammar LL(1) can be
found here:

  https://en.wikipedia.org/wiki/LL_parser#Solutions_to_LL.281.29_Conflicts

>> Another big problem is the amount of redundancy in this grammar.
>
> We can look for that, at least!
>
>> I suspect that the key to simplifying this grammar (apart from moving
>> away from ANTLR for purposes of the specification) is to choose a
>> different set of non-terminals.
>
> That may be true, but I'm not sure what they would be.
> I'll have to think about it, and if you have any other specific examples,
> please post!

In order to make a grammar LL(1), it is often necessary to introduce
auxilliary non-terminals (such as 'Z' in the example above) that don't
correspond to the most natural intuitive notions.  On the other hand,
you are probably missing some useful non-terminals that could eliminate
much of redundancy in your grammar.

For example, the R7RS draft grammar has a useful non-terminal called
<atmosphere> that includes whitespace, comments, and directives.  In
your grammar, whitespace and comments are stripped in several redundant
places.

>> One more nit while I'm on this subject: In the BNF conventions section,
>> you write "a sweet-expression reader MUST act as if it preprocessed its
>> input as follows", but as far as I can tell it's not actually possible
>> to implement this as a preprocessor.  This "preprocessing" must be
>> interleaved with parser, because several syntactic elements affect the
>> preprocessing.  For example, the <* and *> markers manipulate the
>> preprocessor's stack, and yet you need a full parser to recognize those
>> markers.
>
> You don't need a full parser.  ANTLR, for example, really does do
> manipulations of the parsing stack in its *lexer* when it sees <* and *>.
> In fact, there's no way in ANTLR for the parser to send messages back
> to the lexer; ANTLR lexes the entire stream before it calls the parser.

Indeed, I stand corrected.  The preprocessing must be interleaved with
the lexing, not the parsing.  The reason for my mistake is that I don't
tend to think of the lexer as being distinct from the parser in a
typical Scheme reader.

     Regards,
       Mark