Re: More comments, and the ANTLR code is too complex

Re: More comments, and the ANTLR code is too complex David A. Wheeler 13 Jun 2013 23:13 UTC
I said:
> > Hmm, I've always heard complaints about grammars that are NOT LL(1).
> > The obvious way to make sure it's LL(1)... is to specify it as LL(1).

Mark H Weaver <xxxxxx@netris.org> replied:
> I agree that the language should be _expressible_ as an LL(1) grammar.
> However, often a grammar must be obfuscated in order to make it LL(1).

I wouldn't use the word "obfuscated", but I understand your point.

> Other examples of transformations needed to make a grammar LL(1) can be
> found here:
>   https://en.wikipedia.org/wiki/LL_parser#Solutions_to_LL.281.29_Conflicts

Oh, I know those well :-).

One reason to use an LL(1) spec directly is to prove it's LL(1), as noted above.
Another reason is that typical Scheme reader implementations will
be recursive descent parsers, which are basically LL(1) also.
I found it *easy* to implement the spec in Scheme, specifically because it was LL(1).

I don't know if a non-LL(1) grammar would be easier to read, but
such a grammar *would* require a typical implementer to hand-perform those
transformations.  I fear that would inhibit, not encourage, implementations.
But maybe that's not so.

> In the specification, I suggest that you optimize ease of understanding
> over efficiency.  I hope we can all agree which of the above grammars is
> easier to understand.  IMO, the LL(1) grammar belongs somewhere else,
> perhaps in the implementation section.

Hmm. It's not clear to me how much "easier to read" a non-LL(1)
grammar would be in this case.

> On the other hand,
> you are probably missing some useful non-terminals that could eliminate
> much of redundancy in your grammar.

Possibly.  It's hard to know without trying.

> For example, the R7RS draft grammar has a useful non-terminal called
> <atmosphere> that includes whitespace, comments, and directives.  In
> your grammar, whitespace and comments are stripped in several redundant places.

Ah, but redundancies can be eliminated in any BNF notation, including LL(1).

Perhaps that's real issue; we can easily search for redundancies & create new
rules that combine them together.

>  I don't
> tend to think of the lexer as being distinct from the parser in a
> typical Scheme reader.

Fair enough!  And I just went back to look at the ANTLR code; the
"preprocessor" *is* interleaved with the lexer, and it's actually distributed
in a way more complex than is described in the spec.

I don't think the BNF spec has to be *efficient*, but I do think it's important
that it *run*.  Otherwise, it's likely to have an error or 12.

--- David A. Wheeler