Email list hosting service & mailing list manager

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Marc Nieper-Wißkirchen (06 Mar 2019 10:12 UTC)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Lassi Kortela (11 Mar 2019 14:09 UTC)

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Lassi Kortela 11 Mar 2019 14:09 UTC

> I've continued my "restructuring" experiment and the following is the
> current outcome (still work in progress):

This looks terrific! Great job :)

I did a quick experiment to see if I could modify my Python tool to
extract the argument lists from your cleaned-up XHTML markup. It was
trivial even without class attributes. I just took the text inside all
<dt> elements that contain an ASCII arrow ("->"). This won't catch
syntax and variable definitions but 1) most definitions are procedure
definitions and 2) I'm sure we could think of a simple approach anyway.

So no matter what we do on the HTML side, it doesn't seem to affect arg
list parsing. Either add class attributes, or do the full conversion to
strict markup.

> (4)  As hinted above, I think that just "massaging" the original HTML
> (in either variant) is not enough.  I think there should be an extra
> step (with an automated tool), that takes the "massaged" HTML and
> outputs another one used for "publishing" purposes.

I definitely support the full conversion for publishing purposes if we
can find enough people/tools to do it. Adding classes for definitions is
just a fix if we don't.

> I propose this based on the observation that mandating the editor to
> always set `id` and `<a href>` markup would drive one mad...

Definitely agree that we should rely on an automatic tool for things
like this. We could distribute the tool on e.g. so
authors could also run it themselves if they want to.

It might also be easy to automate setting the right <code> and <var>
tags inside the argument lists in procedure and syntax definitions.

> Therefore my workflow proposal is as follows:
> * once the author / editor moves a SRFI in the final status, the
> workflow begins;
> * the original HTML is taken and a few automatic "changes" are done
> that cleanup the HTML (mainly `tidy`, but perhaps we can automate
> something more;)
> * the "volunteer" takes that HTML and restructures the elements into a
> proper hierarchy (as I've done);
> * the "volunteer" executes the automatic "generator" which augments
> the XHTML with `id`, `href` tags and so on;

This seems excellent, provided we can find volunteers.

The decisive question would be how much time it takes per SRFI.

> * replaced tables with lists;  (they were in fact used only for
> appearance purposes;)

This is fine, but some SRFIs have some creative uses of tables (one of
them had made a table using <pre> and Unicode line-drawing characters).
Those few could probably converted to use <table> or <ul>/<ol> tags
manually without disrupting the look too much.