Email list hosting service & mailing list manager

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Lassi Kortela (05 Mar 2019 16:36 UTC)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Marc Nieper-Wißkirchen (06 Mar 2019 10:12 UTC)

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Lassi Kortela 05 Mar 2019 16:36 UTC

Thanks for commenting :) Would be great to progress one way or another.

> But most importantly one is able to take these documents and integrate
> them into ones Scheme implementation documentation as "reference"
> material.


Your R7RS conversion looks great! It would indeed be awesome to have
this for the entire (open source) Scheme ecosystem. I would love to have
an easy API that could serve the documentation for all standards, SRFIs
and implementations. That would make it easy to write comprehensive web
front-ends and editor integrations. And the API could run on a server
that periodically updates itself with the latest documentation from
around the web.

> However, I would envisage something a little bit more:

I like everything you suggest here. The problem is the large body of
existing SRFIs with hand-crafted HTML. I propose to start by adding the
classes because that requires only a few changes so it's easier to get
people on board and get it done sooner.

What's nice is that classes could be combined seamlessly with strict
XHTML, standard stylesheets and standard tag usage. So we could start
with classes and work out the more rigid standardization afterwards.

I empathize with your use case because while classes are sufficient for
indexing, transforming the full text of an SRFI into another format is
still tricky without standardized use of tags.

> I would try to keep these classes as mutually exclusive as possible...
> Especially if we want to be able to extract anything useful out of
> these documents.

> As highlighted above I would use `def-proc` (i.e. one CSS class)
> instead of `def proc` (i.e. two CSS classes).  (Because what means
> just `proc` by itself?  Or just `def`?)

> (I have a feeling that you've got "caught" in the CSS classes extravaganza.)  :)
Actually I've been inspired by Go interfaces which enable exactly this
kind of simple and serendipitous composition :)

I'm not sure I understand what you mean. I thought about this and came
to the opposite conclusion, that having separate classes makes
extracting information *easier*, not more difficult :) Because it's
easier to specify precisely what you want by composing from a small set
of classes as needed.

For example, to find all definitions, you could just look for the "def"
class. To find only procedure definitions, look for "proc def". To find
all definitions without using class composition you'd have to do a set
union: find both "proc-def" and "var-def" and whatever other defs we
have. Similarly, an optional argument can be just "opt arg" instead of
having separate "required-arg" and "opt-arg" classes. And if we have
other optional things, they could use "opt" as well, instead of making
yet another class. So the total number of classes can be small and they
can be combined in flexible and intuitive ways.

I realize that it becomes possible to produce pairs of classes that
don't make any sense (like "opt proc" for optional procedure), but I
don't find that a significant problem. Those combinations are likely to
be harmless and the rules for correct usage will be simple. Plus they
might be useful later: maybe optional procedures will mean something in
the future, in which case the SRFI editor could bless that usage. The
small number of classes, the many examples in the large body of SRFIs,
and possibly machine checking by parsers would make it easy enough to
use them correctly.

> Regarding the `display: none` I think it is a very bad idea...  If it
> is not visible, it will not be reviewed, and thus it will bit-rot, and
> errors will creep into that element.
> I am certain that one can
> come-up with a sensible HTML structure, that allows such "meta-data"
> items to be included besides the actual text.
I agree it would be ideal to have everything visible. But changing the
entire HTML structure of all existing SRFIs would be a huge effort. The
invisible elements would be a workaround until that can be done. If a
common structure can be agreed upon, then we can simply change them into
visible ones, so no harm is done. I think it's a good interim solution
(by which I mean: all other interim solutions would be worse :D)

The invisible tags would get reviewed because tools would use them to
index the SRFIs. So you would review an SRFI by looking at its index
entries, then fix any incorrect tags until the index is accurate.