datum comments of sweet-expressions

datum comments of sweet-expressions David A. Wheeler 11 Jul 2013 02:05 UTC
On 29 May 2013 02:31:25 -0400, Mark H Weaver posted a long set of comments.
One recommendation was to supporting datum comments of sweet-expressions
(#; + whitespace).  The idea makes sense, and I did anticipate this.
However, the obvious ways imply some additional trickiness in grammar
and implementation.  Here's how I'm thinking about tackling this, but
if anyone has a better idea, *please* speak up!!

The current SRFI-110 says:
"Scheme’s datum comments (#;datum) comment out the next neoteric
expression, not the next sweet expression (and please don’t follow the
semicolon with whitespace)."

Mark H Weaver recommends:
"I often put "#;" on the preceeding line, which you're now asking me
not to do. What is the purpose of this request? Also, "#;" becomes
much less useful if it cannot comment out an entire sweet expression.
Perhaps "#;" should have a similar rule as the traditional
abbreviations: if it is followed by whitespace, then the following
/sweet expression/ is ignored, otherwise the following /neoteric
expression/ is ignored. What do you think?"

I have *definitely* thought about this.  Indeed, I wrote the text
"don't follow the semicolon with whitespace" so that supporting
datum comments of sweet-expressions could be added as a future addition.

But if we add this as a *requirement*
to SRFI-110, then the grammar rules and sample implementation
have to be modified to handle it.  For example:
a b
  c
  #; e
     f
  g
=> (a b c g)

THE CHALLENGE: Properly supporting this requires properly supporting
datum comments of a sweet-expression if it is the *last* item, e.g.:
fee fie
  foe
  fum
  #; blood
    Englishman
=> (fee fie foe fum)

Handling *last* items turns out to be trickier to do, and I think
that trickiness has nothing to do with whether or not the grammar is LL(1).
Currently there isn't a good way to handle lines that produce no value.
In particular, the "it_expr" rule *must* return a datum.
In the case of lines that begin with "#!sweet", the grammar rules
recurse so they can have something to return.  This recursion
is why the GROUP_SPLIT rule is so complicated.  That approach
won't work here, because the datum comment might be the last group
at that indent level.

So for the moment, let's say that we'll try to fix up the existing
LL(1) rules instead of rewriting the grammar rules in a completely
different notation.  Even if we do that, I want to do that as a separate
stage, and I think we should explore simplification further first.
So...  how could we do this?

One approach would be to fiddle with all the grammar rules that
invoke it_expr.  However, I think that would be really ugly and involve
a lot of repetition in the rules.  The problem is that the calling
rules each have to handle identification of the situation AND
invoke a different action rule for that case.  Ugh.

I think a better approach would be to modify the
key production "it_expr" so that it can return an "EMPTY" value,
distinct from a valid datum like (), that indicates
"no value at all".  This would require some the action rules
to handle "EMPTY" values.  I think that could be handled by
a few tweaked procedures, e.g., some "cons" can be replaced with "econs"
(aka "empty-handling cons"):
(define (econs x y)
  (cond
    ((eq? y EMPTY) x)
    ((eq? x EMPTY) y)
    (#t (cons x y))))

If we do this, one side-effect is that the GROUP_SPLIT rules could
probably become much simpler.  We'd no longer need to recurse deeply,
because there'd be a way to signal that we saw an empty result.

Thoughts?  Comments?  Is there a better way I'm not seeing?

--- David A. Wheeler