Re: More comments, and the ANTLR code is too complex

Re: More comments, and the ANTLR code is too complex David A. Wheeler 27 Jul 2013 04:27 UTC
On Fri, 14 Jun 2013 17:26:53 -0400, Mark H Weaver <xxxxxx@netris.org> wrote:
> However, _if_ turns out that a non-LL(1) grammar would be easier to
> understand, then I think that's what should be used in the actual specification.

A reasonable premise!  Unfortunately - and this may be a limitation on *my* part -
I've so far been unable to find a non-LL(1) grammar that's much easier to
understand than its corresponding LL(1) grammar for this language.

I really *do* want to make you happy, or at least happier, if I can.
Below is a little bit about how I'm trying to do so.

> If you disagree, then consider this: if you were reading the
> specification of a traditional infix language, which of the following
> grammars would you prefer to see when you were _learning_ about the language: ...

I completely agree with you that the LALR(1) form
for infix operators is simpler than LL(1); I've written both.
But sweet-expressions have no infix operators, at least in the traditional sense.

Usually an LL(1) grammar is complicated because of the need for
left factoring and left recursion removal (the Wikipedia article has a nice summary:
http://en.wikipedia.org/wiki/LL_parser ).  The giveaway of such problems
is that many productions will match epsilon ("empty").
However, I've checked, and the current sweet-expression LL(1) grammar
does not have that property; only one key BNF production (post-period)
can match an empty sequence at all.

Since there's no obvious set of transformations to create LL(1)
that can be reversed, I have not found a way to
transform the LL(1) productions directly into a simpler LALR(1) form.

That said... I agree with you that the starting LL(1) grammar was
overly complex.  So, I'm trying to simplify the grammar in LL(1) form,
while continuing to look for ways to transform it into a
non-LL(1) form if that would make it much easier to understand.

I've already taken the following simplifying steps:
- Split up the it_expr rule into a much larger set of smaller rules.
- Pulled the initial indent rule into its own production,
  which greatly simplifies the overall t_expr rule.
- I've replaced "hspace*" with "hs", where "hs : hspace*;".
  Since this happens all over, it has the effect of making rules shorter.

My hope is that by implementing a number of simplifying steps
to make the current LL(1) clearer, we'll meet the goal of having
an easier-to-understand specification.

If anyone has any other specific suggestions, please speak up!

--- David A. Wheeler