Email list hosting service & mailing list manager

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Marc Nieper-Wißkirchen (06 Mar 2019 10:12 UTC)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Ciprian Dorin Craciun (06 Mar 2019 12:54 UTC)

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Ciprian Dorin Craciun 06 Mar 2019 12:53 UTC

On Wed, Mar 6, 2019 at 1:53 PM Lassi Kortela <> wrote:
> What does everyone you think? I would prefer to try this first because
> it's an established standard with readily available tools. All the
> while being simple and familiar (they just use HTML with light
> additions, much like we've been doing so far, and specify a standard
> conversion formula to JSON). This would also settle our debate about
> how to use class attributes, since they specify that :-P
> For example, here's how one might mark up a procedure definition:
>      <p class="h-proc-def">
>        <b>Procedure: </b>
>        <code>
>          <a class="p-name" name="make-array">make-array</a>
>          <var class="h-arg">interval</var>
>          <var class="h-arg">getter</var>
>          [ <var class="h-arg"><span
> class="p-type">optional</span>setter</var> ]
>        </code>
>      </p>

(Without upsetting anyone) I really think that this is the best
example on how to fail at this endeavor.

It is so complex:

* in order to identify that `make-array` is actually a procedure
definition, we have to look for an element that has the class
`h-proc-def`, which should contain (somewhere not necessarily in
direct children) an element with the class `p-name` whose attribute is
the actual "name" of the procedure;  just trying to think about
expressing this in code, especially with XML libraries or XSLT scares
me...  (for a second try just try to imagine how the code to extract
arguments looks like;)

* it provides too much overhead:  it has too much duplication, the
`make-array` token appears twice;

* it fails to capture all signature elements:  what is the output of
the procedure?  what are the types of various arguments?

When designing this format think about how one could use `pup` / `jq`
to extract the data.

My proposal is to keep things simple:

* for indexing just using `<a class="proc-def">make-array</a>` is enough;

* for actual signatures I think an S-expression based description is
better (however see the other paragraph where I note that perhaps this
is too much for SRFI's);  for example:

        (type constructor)
        (export scheme:base)
            ((range-length-zero) -> vector-empty)
            ((range-length-zero any) -> vector-empty)
            ((range-length-not-zero) -> vector-not-empty)
            ((range-length-not-zero any) -> vector-not-empty))

I've tried hard to think about this problem (when I did my R7RS
documentation conversion) and came to the conclusion that one can't
expect to extract accurate information from "text" documents without
making a mess out of them.  Then I came to the conclusion that just
"back-referencing" things in the actual text, and providing "external"
structured syntax / signatures is the best approach.

Take the example of `cond`:

I start with the structured syntax signature, and then in the
description, formatted as CommonMark I could use CommonMark references
to "special" tags to link to other elements.  (I haven't included one
in `cond`'s description, but I have technical support for it in the
parser / formatter.)

However this is perhaps too-much for the SRFI use-case.  Instead I
think just having a few "markers" to allow indexing /
back-referencing, then a simplified / standard structure (sections,
paragraphs, lists, code snippets, etc.) is enough.  Based on this one
can take the (X)HTML and "render" it as CommonMark / other formats to
be included in his own documentation.

>  > I'm still not convinced of the need to go all the way to XHTML.
>  > If there's a reliable way to convert from HTML to XHTML, then
>  > there's no need for XHTML to be the on-disk format.
> XHTML is definitely not necessary for indexing, but it may be
> necessary/much easier for the full-text conversions Ciprian would like
> to do. I guess the decision rests on how easy it is to automate?

I agree that XHTML is not strictly necessary, but as highlighted above
it would help us from a technical point of view, especially to convert
it to other formats to be included in other documentations.