Email list hosting service & mailing list manager

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Marc Nieper-Wißkirchen (06 Mar 2019 10:12 UTC)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing elf (11 Mar 2019 03:06 UTC)

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing elf 11 Mar 2019 03:05 UTC

This may be a stupid suggestion, but perhaps it would be easier to write the srfis in a specialised markup format, and then generate both the docs and html/xhtml from that?

From the discussion, it looks like one of the biggest problems is that html isnt really natively suited to this without adding some kind of extra set of structure/tags on top of it... in which case, why not reverse the problem and generate html from something suitable? (I imagine that it will also be much less work for the current and future authors, and significantly more flexible for future modification.)

2c, not even a cup of tea these days.


On March 10, 2019 8:48:25 PM UTC, Ciprian Dorin Craciun <> wrote:
>On Thu, Mar 7, 2019 at 10:17 PM Ciprian Dorin Craciun
><> wrote:
>> As it stands I have tried to eliminate as much HTML as possible
>> was used mainly for formatting purposes).  In the next days I'll take
>> a look at how I can change the annotations to make it more maleable
>> exports and indexing.
>> Just a note on my approach:  my focus is mainly on how we can use
>> XHTML to obtain the following:
>> * easy indexing and back-referencing of the elements;
>> * easy export into other formats (like Markdown), thus the used
>> (X)HTML elements should be kept to a minimum;
>> * easy splitting of the text into sections and definitions so that
>> can programatically extract only that section;
>I've continued my "restructuring" experiment and the following is the
>current outcome (still work in progress):
>Basically what I done so far is cleaned the HTML into XHTML Basic 1.1
>and highly structured it (by just adding markup elements, without
>rewording or changing the actual text).  (I.e. instead of HTML classes
>I've actually just used `<var>`, `<dfn>` and correctly nested list
>items, definition terms, etc.)
>Based on this experience (which took quite some while), I've made the
>following observations:
>(1)  HTML is a very unpleasant language to work with.  (Alex is right!)
>The only way I was able to work with it was to open the HTML in a
>browser side-by-side with the text editor.  Also my "structured" CSS
>helped a lot in detecting incorrectly nesting, `<var>` elements
>containing more than one "variable", etc.
>However it can be done.
>(2)  As SRFI-1 stands, the HTML markup was mainly used for formatting
>and not "semantic"...  For example in case of procedure definition
>arguments, a single `<var>` was used to contain all arguments, instead
>of one `<var>` per argument.  (The same for many other elements.)
>Therefore the "quality" of the HTML code is (as I expected) far from
>the quality we expect for example from our Scheme code...  (I'm not
>criticizing the authors and publishers, as the main purpose of HTML
>was for WYSIWYG purposes, thus nobody focused on the actual HTML code
>(3)  There are two approaches in "augmenting" the current SRFI HTML's:
>* either we take the HTML code as is, and use HTML classes to markup
>various elements, hoping that afterwards we just run a tool that
>"massages" the whole mess and output something more than an index of
>elements;  (this is the approach Lassi took;)
>* or we take the HTML code and restructure it into a more "strict"
>hierarchy, with some clear "patterns", and afterwards run a tool to
>extract and augment that HTML into the final product;  (this is the
>approach I'm proposing;)
>(4)  As hinted above, I think that just "massaging" the original HTML
>(in either variant) is not enough.  I think there should be an extra
>step (with an automated tool), that takes the "massaged" HTML and
>outputs another one used for "publishing" purposes.
>What do I mean by this?  Well in my approach I wouldn't ask the author
>/ editor to set the `id="cons"` tag where the `cons` function is
>defined, but instead the automated tool, based on finding
>`<dfn><code>cons</code><dfn>` would create that id on the `<dd>`
>element containing the definition.  At the same time, whenever it
>finds an `<code>cons</code>` it would automatically wrap that in a `<a
>href="#cons"><code>cons</code></a>` for back-referencing.  (The same
>would happen for bibliographical items, sections, etc.)
>I propose this based on the observation that mandating the editor to
>always set `id` and `<a href>` markup would drive one mad...
>Therefore my workflow proposal is as follows:
>* once the author / editor moves a SRFI in the final status, the
>workflow begins;
>* the original HTML is taken and a few automatic "changes" are done
>that cleanup the HTML (mainly `tidy`, but perhaps we can automate
>something more;)
>* the "volunteer" takes that HTML and restructures the elements into a
>proper hierarchy (as I've done);
>* the "volunteer" executes the automatic "generator" which augments
>the XHTML with `id`, `href` tags and so on;
>My next steps (perhaps in the next weekend) is to continue the
>structuring and introduce the concept of "sections" (basically based
>on `<div>`'s), and see how I can re-format the definitions so that
>automatic "splitting" of the document is easy.
>Also I'll try to come-up with a second CSS, that can be applied to the
>same structured XHTML, but for the purpose of display.
>> I've made a fork of SRFI-1 and started chopping and transforming the
>> HTML into XHTML, at the same time eliminating some "boilerplate"
>> elements (to focus only on the actual text), and changing some HTML.
>> The changes can be seen at:
>I've applied the following (not necessarily in order, see the diff
>above for the actual order):
>* converted everything to XHTML Basic 1.1 (it's a very lightweight and
>constrained XHTML variant, that should be implemented even by the most
>basic hand-helds...  and surprisingly the conversion required just a
>few minor edits to be compliant...)
>* used `tidy` to re-format everything;
>* some minor non-semantic changes;
>* replaced tables with lists;  (they were in fact used only for
>appearance purposes;)
>* removed all `<div>` elements as they were used only for display
>* used `dfn` elements to mark where the procedure is defined;  (see
>bellow the outcome;)
>* split `<var>x y</var>` arguments into `<code><var>x</var>
><var>y</var></code>` so that the signature is structured;
>    <dt><dfn><code>cons</code></dfn>
>    <code><var>a</var> <var>d</var> -&gt; <var>pair</var></code></dt>
>* replaced all `<a href="...">zzz</a>" with plain `<a>zzz</a>`, based
>on the observation that these `<a>` elements are used for bibliography
>purposes, thus they can be "transformed" afterwards;  (see bellow
>about proposed workflow);
>* removed all `id="..."` attributes, based on the observation that
>given a "good" structured document, these can be added afterwards;
>* removed all "proc-defn" (and similar), based on the observation that
>they aren't used anywhere in the document, and if they would be used
>for signatures it would be superfluous;
>* removed all `<var>` (and other markup) from within `<pre>` based on
>the observation that given the context where the `<pre>` appears (i.e.
>which `<var>` are present in neighboring elements) we could re-add
>this markup;