More comments, and the ANTLR code is too complex
Mark H Weaver
(29 May 2013 07:04 UTC)
|
Re: More comments, and the ANTLR code is too complex
David A. Wheeler
(29 May 2013 17:39 UTC)
|
Re: More comments, and the ANTLR code is too complex
David A. Wheeler
(31 May 2013 17:03 UTC)
|
Re: More comments, and the ANTLR code is too complex
David A. Wheeler
(01 Jun 2013 02:27 UTC)
|
Re: More comments, and the ANTLR code is too complex
David A. Wheeler
(10 Jun 2013 00:21 UTC)
|
Re: More comments, and the ANTLR code is too complex
Alan Manuel Gloria
(10 Jun 2013 02:01 UTC)
|
Re: More comments, and the ANTLR code is too complex
David A. Wheeler
(12 Jun 2013 00:25 UTC)
|
Re: More comments, and the ANTLR code is too complex Mark H Weaver (12 Jun 2013 20:13 UTC)
|
Hi David, "David A. Wheeler" <xxxxxx@dwheeler.com> writes: > Below is a first shot at breaking up it_expr, currently 1 long rule, into 2 rules. > This could obviously be repeated to make more rules, each one simpler. > Not saying it's done, but would it help to break the current longer rules > into more but smaller rules? No. This doesn't help at all, because it doesn't reduce the total complexity of the specification. My concern is the amount of mental effort required to understand the precise specification. Part of the problem is that your specification is actually an _implementation_, which is made more complex by efficiency concerns. For example, constraining yourself to an LL(1) grammar probably rules out a more elegant presentation. Another big problem is the amount of redundancy in this grammar. For example, the pattern "scomment hspace*" is repeated in many places. Sometimes it's a prefix wrapped in (...)*, and other times it's iterated over by tail recursion. The pattern "COLLECTING hspace* collecting_tail hspace*" is also repeated in several places. These redundancies make more work for the reader, and make me wonder "are all these actually the same, or are there slight differences?" I suspect that the key to simplifying this grammar (apart from moving away from ANTLR for purposes of the specification) is to choose a different set of non-terminals. Please take a look at section 7.1 of the R5RS (or the R7RS draft). Understanding that grammar is almost effortless, and there's almost no redundancy. Now take a look at the specifications of SRFI-10, SRFI-30, and SRFI-38. All of them are expressed as a list of modifications to the R5RS grammar. That's the kind of thing I'd like to see in the SRFI-110 specification. One more nit while I'm on this subject: In the BNF conventions section, you write "a sweet-expression reader MUST act as if it preprocessed its input as follows", but as far as I can tell it's not actually possible to implement this as a preprocessor. This "preprocessing" must be interleaved with parser, because several syntactic elements affect the preprocessing. For example, the <* and *> markers manipulate the preprocessor's stack, and yet you need a full parser to recognize those markers. Also, if I understand correctly, indentation is only processed outside of n-expressions. I also think that there needs to be a much simpler sample implementation: one which does not attempt to be fully featured (e.g. omit support for source location tracking), and which is not a fully self-contained reader, but is instead expressed in terms of existing procedures which are likely already present in an SRFI-105 reader (or which could be easily created from existing code). In other words, you should help implementors understand how to add SRFI-110 to their existing readers with a minimal amount of code changes. The resulting code needs to be as simple as reasonably possible. Here's one possible strategy: Assume the existence of an n-expression reader. Now write a t-expression reader in terms of it, in the most elegant Scheme code possible. It turns out this is not quite possible, but hopefully the problems can be patched up by assuming the existence of some other helpers, and/or by adding some functionality to the n-expression reader. After our last email exchange, I spent some time thinking about this, and identified a few additional things you might need: * In order to recognize the special markers, you'll need either (1) a way to "unread" characters, or (2) a way for the n-expression reader to tell you that e.g. the symbol '<*' was "by itself" for purposes of SRFI-110. * You might need a helper to read special comments without consuming the following datum. I'm sorry that I cannot be more complete in my analysis of what needs to be done, but my time (and motivation) is limited. Reformulating this code will be a lot of work, but I suspect that adoption will be very low unless you can show implementors how to add SRFI-110 easily and with a small amount of code. Regards, Mark