More comments, and the ANTLR code is too complex

Show/hide message thread

More comments, and the ANTLR code is too complex Mark H Weaver (29 May 2013 07:04 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (29 May 2013 17:39 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (31 May 2013 17:03 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (01 Jun 2013 02:27 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (10 Jun 2013 00:21 UTC)

Re: More comments, and the ANTLR code is too complex Alan Manuel Gloria (10 Jun 2013 02:01 UTC)

Re: More comments, and the ANTLR code is too complex David A. Wheeler (12 Jun 2013 00:25 UTC)

Re: More comments, and the ANTLR code is too complex Mark H Weaver (12 Jun 2013 20:13 UTC)

More comments, and the ANTLR code is too complex Mark H Weaver 29 May 2013 06:31 UTC

I've made another attempt to understand SRFI-110 clearly enough to
implement it, and once again I've failed to do so before losing
patience.

This is the first time I've attempted to read and understand the ANTLR
grammar, and I'm sorry to say that I'm very unhappy with it.  If it
cannot be made simpler and more easily comprehensible than it is now,
then I'm unlikely to implement SRFI-110 in Guile.  I suspect other
implementors would feel similarly.

In the interest of encouraging implementors, I'd recommend making a
serious effort to rewrite the grammar to be as conceptually simple and
clear as possible.

Here are some specific comments about the ANTLR code:

* "BLOCK_COMMENT : '#|' // This is #| ... #|"
   That should be "#| ... |#"

* EOL_SEQUENCE is never used.  EOL is used instead, even though it is
  not defined.

* APOSW, QUASIQUOTEW, UNQUOTEW, and UNQUOTE_SPLICEW are not defined.

* Inconsistent syntax is used within {} in the ANTLR.  In most places
  standard Scheme syntax is used, but in 'collecting_tail', the syntax
  is more like C.

* Why are the action rules in 'n_expr' simply expressions that refer to
  values such as '$n1', but the action rules of 'collecting_tail' are
  instead assignment statements that refer to values such as '$more.v'?

* Why is there special handling of (FF | VT)+ EOL ?

* What does 'isperiodp' do exactly?  What if the datum really is "." or
  the symbol whose name is a single period? (written #{.}# in Guile).

* The non-terminals 'body' and 'it_expr' use the symbol 'same' even
  though the text implies that no extra symbol is generated by the
  preprocessing step in that case.  Where does 'same' come from?

And here are some comments about the tutorial:

* "Scheme’s datum comments (#;datum) comment out the next neoteric
  expression, not the next sweet expression (and please don’t follow the
  semicolon with whitespace)."

   I often put "#;" on the preceeding line, which you're now asking me
   not to do.  What is the purpose of this request?  Also, "#;" becomes
   much less useful if it cannot comment out an entire sweet expression.
   Perhaps "#;" should have a similar rule as the traditional
   abbreviations: if it is followed by whitespace, then the following
   /sweet expression/ is ignored, otherwise the following /neoteric
   expression/ is ignored.  What do you think?

* I'd like to see a few more examples for improper lists, such as:

     f
       a .
       b

  and:

     f
       a b
       . c

* In the tutorial, I found the examples of $ (SUBLIST) a bit confusing:

    a b $ c d          ==>   (a b (c d))

    a b $ c d e f $ g  ==>   (a b (c d e f g))
                             ; Not (a b (c d e f (g)))

   This leaves me uncertain of whether the second case is somehow
   caused by two $'s on one line, or because there's only one item
   after the $.  I'd like to see an example like "a b $ c" or
   "a b $ c d e $ f g" to clarify.

* "A sweet-expression reader MUST support three modes: indentation
  processing, enclosed, and initial indent."
  [...]
  "A marker MUST only have its special meaning when indentation
  processing is enabled,"

   This sounds as if "*>" MUST not be recognized, because the reader
   will be in "enclosed" mode at that point, no?

* "2. If top is the empty string and the indentation length is nonzero,
   symbol INITIAL_INDENT is generated and the reader changes to initial
   indent mode. When an end-of-line sequence is reached the mode changes
   back to indentation processing."

   If the reader was in "enclosed" mode, then presumably the mode
   should not change back to indentation processing, right?

* "1. If an end-of-line sequence immediately follows the indentation and
      the indentation length is nonzero:
       a. If the indentation contains “!”, it is ignored; an
          implementation MUST consume the end-of-line sequence and start
          applying these rules again, from the beginning, with the next
          line.
       b. If the indentation does not contain “!”, it is considered a
          line with no characters (thus indentation has zero length) and
          the rest of these rules are applied."

   I vaguely recall that the distinction here was going to be removed
   as a simplification of the rules.  What that idea scrapped?

* "A marker MUST only have its special meaning when indentation
  processing is enabled, it is preceded by indentation or hspace, it is
  followed by an hspace or end-of-line, and when it starts with the
  character shown (e.g., neither |$| nor '$ contains a marker)."

   The last clause here, "when it starts with the character shown", is
   poorly worded IMO, and redundant with the requirement that "it is
   preceded by indentation or hspace".

     Regards,
       Mark