Email list hosting service & mailing list manager

(missing)
(missing)
(missing)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Marc Nieper-Wißkirchen (06 Mar 2019 10:12 UTC)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Lassi Kortela (07 Mar 2019 20:20 UTC)

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Lassi Kortela 07 Mar 2019 20:20 UTC

 > This is excellent work!

Thanks! It was a happy accident, worked far better than I expected.

 > It would be great to include HTML IDs in the template, too.  That would
 > make it possible for other tools to link directly to the reference
 > material in the SRFI document.  Of course, the IDs would need be
 > extracted by your tool, too.

I agree. Direct links would be really nice. I wonder if the HTTP URL
or HTML specs place any limitations on text that can appear in the
IDs. Scheme symbols use many weird characters like ? ! < > = + * /

 > This seems like a great way to bootstrap the whole process.

I just timed editing two SRFIs from scratch using the tool. SRFI 41
took 12 minutes and SRFI 72 (a large one) took 20 minutes. If 3 people
each edited on average 2 SRFIs per day, we'd have the entire back
catalogue done in a month!

 > But if the author doesn't encode metadata, we can create the same
 > external file manually.

When you get into the groove you can edit and check one ordinary
procedure definition in about 10 seconds. So it might be both faster
and less error-prone to just mark up the source HTML and use automatic
conversion instead of doing it by hand (copy/paste from web browser).

While editing I ran the tool after each finished definition to see its
conversion (and some earlier ones for context). This way, errors are
easy to catch. The command was:

     ./tool.py srfi-41.html && tail -n 30 srfi-41.lisp

 > If we put examples of this new approach in the SRFI template, then
 > authors who wish to can, with little effort, encode metadata in
 > their documents.

 > Editors and volunteers can then use your tool to extract signatures
 > into a standard format like Ciprian's.

 > As far as I can tell, we've moved from the prospect of doing more
 > work to the prospect of doing less work.

I would still like to have a little more work done ;) By either the
authors or the editors/volunteers. In particular:

* Would be nice if somebody adds that HTML metadata to new SRFIs. As
   demonstrated, this shouldn't take more than half an hour even for
   the most complex SRFIs.

* Though requirements on HTML are lenient, what would really help is
   to mandate that all tags have closing tags. Unbalanced or missing
   tags have really been the only detriment to machine-parsing that
   I've encountered. There are "lenient linters" that check only this,
   e.g. https://www.jwz.org/hacks/validate.pl

* I still think it would be best to generate all of the S-expression
   stuff from the HTML, now that probably the hardest thing (arg lists)
   has proven this easy. So we could add classes for the abstract,
   author, date, status, license, and other general information like
   that.

I guess it would be best to store the S-expression files in the same
git repo where we have the SRFI itself. If/when changes are made to
the HTML, we can re-run the tool to generate a new S-expression file.
It can then compare the old and new files to see if there are
suspicious-looking differences between them (e.g. some definitions
have changed or gone missing in the new version of the SRFI).

This would ensure that we always have metadata and the HTML stays in a
parseable state even after edits. If an author re-generates the HTML
using their own personal tool and it loses our metadata classes, an
editor could use the saved S-expression file to put the classes back
into the new HTML, using the tool's comparison feature to check what's
missing.