Email list hosting service & mailing list manager

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Marc Nieper-Wißkirchen (06 Mar 2019 10:12 UTC)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Ciprian Dorin Craciun (06 Mar 2019 12:33 UTC)

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Ciprian Dorin Craciun 06 Mar 2019 12:32 UTC

[Replying to multiple posts in the same email.]

On Wed, Mar 6, 2019 at 5:03 AM Arthur A. Gleckler <> wrote:
> | I would suggest roughly the following design approach:
> |
> | * Start with some version of XHTML
> | * Use only the XHTML tags we need in a rigid structure
> | * Add class attributes to signify everything else
> | * Specify a standard CSS stylesheet using those tags and classes
> That sounds good.  Concretely, what I would like to see is an
> example SRFI converted in this way.

This is exactly what I had in mind.

On Wed, Mar 6, 2019 at 5:07 AM Arthur A. Gleckler <> wrote:
> Lassi Kortela <> writes:
> | IMHO it'd be most desirable to have a solution shaped like a
> funnel:
> |
> | * Stage 1: Lenient SRFI markup from author
> | * Stage 2: Lenient SRFI markup with standard classes
> | * Stage 3: Standard SRFI markup and standard classes
> That sounds fine, but I would like to see stage 1 and stage 3 be
> as close as possible.  In other words, the less work is required
> to go from stage 1 to stage 3, the better.  That way, the author
> is encouraged to do the work from the beginning because it's easy,
> and we don't have any line that we cross that makes it hard for
> the author to edit the document, e.g. if errata are reported.

For me stage 2 and 3 are almost the same, and should be merged into one.

On Wed, Mar 6, 2019 at 8:19 AM Arthur A. Gleckler <> wrote:
> I continue to be impressed with pup.  Per Bothner's SRFI 164 already has a bunch of appropriate markup in it, so I've been experimenting with it and pup.  For example, here I extract the names of all the procedures defined in that SRFI:
> [...]
> This is just a proof of concept.  I'm not arguing that we should use pup or jq, or that Per's specific markup is the correct one — only that, even with a simple convention like Per is using in his SRFI, it's already possible to extract useful information.  We really shouldn't have to do much to encode useful information even in basic HTML.

This is exactly why I've advocated XHTML, because we could easily
replace `pup` with a XML to JSON transformer and then use `jq` or
alternative to do the rest of the processing.

(I don't know how mature is `pup`, and thus don't know how ready it is
to implement all the HTML quirks...  I guess it's OK to extract a few
`<span>` elements, but further than this I think it will run into

On Wed, Mar 6, 2019 at 12:13 PM Marc Nieper-Wißkirchen
<> wrote:
> As an author of a number of SRFIs, I agree that having a more
> standardized format, in which SRFIs are written, is a great idea. This
> would also make it easier to incorporate SRFIs into future RNRS
> standard documents.

This is the other important aspect I am interested in, namely
incorporation of the actual SRFI (and RnRS documents) into other
interpreter / compiler / platform documentation.

To me this is the most valuable outcome of such an endeavor, the
indexing being only second place.

> However, XML/HTML/XHTML are not the best formats to write by hand. If
> we are going to have a change, I would like to propose a subset of
> TeX, which is much more convenient for us authors. Contrary to HTML,
> TeX can be extended with macros. Someone would have to write a couple
> of macros that are the basis for SRFI documents. Authors would have to
> use these macros so that software can easily convert the TeX source
> into other formats and is able to index them.

Although I understand that Scheme is used mostly in/by academia, where
LaTeX is "lingua-franca" for authoring papers and documentation.

However it is extremely horrible to work with from other tools...  And
I say this with full knowledge as I've burned quite a few days trying
to extract from LaTeX the R7RS documentation to be included in my own
Scheme implementation documentation:

I've seen a couple of "automatic" tries done by others (the mailing
list has a few threads about this), and even tried to do one myself.
However in the end I gave up, and just started to copy-paste-replace
with regular expressions.  It was a nightmare, but once I was done I
got out some nice CommonMark and HTML files out of that.  (However I
wouldn't want to repeat the whole process again...)

Bellow is the S-expression based format I've chosen, then followed by
the CommonMark generated based on that, and above is the HTML