Gory details of parsing

Show/hide message thread

SRFI 176: Version flag Arthur A. Gleckler (07 Oct 2019 05:01 UTC)

Re: SRFI 176: Version flag John Cowan (07 Oct 2019 18:43 UTC)

Re: SRFI 176: Version flag Lassi Kortela (07 Oct 2019 20:26 UTC)

Re: SRFI 176: Version flag John Cowan (07 Oct 2019 20:50 UTC)

Re: SRFI 176: Version flag Lassi Kortela (07 Oct 2019 21:42 UTC)

Re: SRFI 176: Version flag John Cowan (07 Oct 2019 22:49 UTC)

Re: SRFI 176: Version flag Lassi Kortela (11 Oct 2019 23:40 UTC)

Re: SRFI 176: Version flag Arthur A. Gleckler (13 Oct 2019 03:40 UTC)

Re: SRFI 176: Version flag John Cowan (13 Oct 2019 04:23 UTC)

Re: SRFI 176: Version flag Arthur A. Gleckler (13 Oct 2019 04:53 UTC)

Re: SRFI 176: Version flag John Cowan (15 Oct 2019 12:20 UTC)

Storing manual pages and other data in executables Lassi Kortela (16 Oct 2019 20:39 UTC)

Re: Storing manual pages and other data in executables John Cowan (16 Oct 2019 21:36 UTC)

Re: SRFI 176: Version flag Shiro Kawai (13 Oct 2019 05:24 UTC)

Storing manual pages and other data in executables Lassi Kortela (14 Oct 2019 11:56 UTC)

Re: Storing manual pages and other data in executables John Cowan (14 Oct 2019 16:33 UTC)

Re: Storing manual pages and other data in executables Lassi Kortela (17 Oct 2019 17:06 UTC)

Gory details of parsing Lassi Kortela (07 Oct 2019 21:04 UTC)

Re: Gory details of parsing John Cowan (07 Oct 2019 23:05 UTC)

Re: Gory details of parsing Lassi Kortela (11 Oct 2019 23:50 UTC)

Re: Gory details of parsing John Cowan (15 Oct 2019 01:52 UTC)

Working out the platform and compiler info Lassi Kortela (14 Oct 2019 16:52 UTC)

Clarification of tuples Lassi Kortela (14 Oct 2019 17:09 UTC)

Re: Working out the platform and compiler info John Cowan (14 Oct 2019 18:07 UTC)

Gory details of parsing Lassi Kortela 07 Oct 2019 21:04 UTC

> 5) Since there is no limit on the length of lines (and modern *ix tools
> impose none), you might as well remove support for multi-line
> S-expressions.  This will encourage people to use the "Hacks for lists"
> style, which should be called "Simplifying complex lists".

Indeed, parsers work fine with long lines.

There are two mechanisms here:

1) One S-expression that spans more than one line:

(foo alpha bravo
      charlie delta)

2) Many S-expressions merged into one:
(foo alpha bravo)
(foo charlie delta)
becomes the alist ((foo alpha bravo charlie delta))

Option 1 is easy to read from Scheme since the Scheme reader doesn't
distinguish newlines from other whitespace. So you can do (read-all) and
then (assoc 'foo) to get the properties. If there are duplicates of 'foo
then it misses them.

Option 2 is easy for Unix tools to read. By default, they grab duplicate
values of "foo". Something like `grep '^(foo .*)$'` always gets all
"foo" lines, and it's actually harder to block some of them out than to
include them. E.g. what is the complete value of "foo" in this case:

(foo will the real value)
(foo please stand up)

A naive Scheme implementation would say it's (will the real value).

A naive Unix grep|sed or awk implementation would say it's (will the
real value please stand up).

Since it's generally easier to do extra work in Scheme than in Unix, I
added the auto-merging of the properties. Scheme reads the info once and
then does any number of `assoc` calls on it so it's a one-time cost to
get Scheme behavior that matches the default shell script behavior.

By contrast, every time a shell script wants to get another property it
has to do a separate grep|sed run on the output. All of those would have
to have `| head -n 1` added. I don't even know how to do it in awk; use
a counter variable or call exit()? Are those portable?

Since the merging doesn't depend on any particular S-expression being
multi-line, we could remove the multi-line support. But that would
complicate things for Scheme, where `read` happily reads multi-line
sexprs by default and you have to work harder to block those.

I guess the Scheme-side reader could replace plain `read` with
`(call-with-port (get-input-string (read-line)) read)`. But I'm not sure
whether that would stop people from writing multi-line S-expressions by
accident or purpose. In general, line-oriented formats are dodgy because
newlines are just whitespace so it's easy for things to seem like you
can break lines. Unix tools can't parse multi-line S-exprs but they'll
just miss those properties then.

I don't know what to do about this; I think the current solution of
allowing multi-line sexprs, but discouraging them, is decent. I'm not
sure it's the best solution. Ideally people wouldn't write those things
in the first place, but there may be some scenario where it's the lesser
evil.