Email list hosting service & mailing list manager

(missing)
(missing)
(missing)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Marc Nieper-Wißkirchen (06 Mar 2019 10:12 UTC)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Lassi Kortela (06 Mar 2019 11:53 UTC)

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Lassi Kortela 06 Mar 2019 11:53 UTC

 > This looks like html microformats (e.g.
http://microformats.org/wiki/h-card)
 >
 > I am not sure why they overload the class attribute instead of using
 > more sensible html attribute names given any attribute can be used
 > in css.

This may get my vote! In patricular, the current version 2 of
microformats is even simpler than the "classic" stuff.

They have a proper spec: <http://microformats.org/wiki/microformats2>.
There are clear yet flexible rules on how to put metadata in HTML
markup and transform it to JSON:
<http://microformats.org/wiki/microformats2-parsing> (it looks complex
but the common cases are simple). And many parsers already exist:
<http://microformats.org/wiki/parsers>.

What does everyone you think? I would prefer to try this first because
it's an established standard with readily available tools. All the
while being simple and familiar (they just use HTML with light
additions, much like we've been doing so far, and specify a standard
conversion formula to JSON). This would also settle our debate about
how to use class attributes, since they specify that :-P

For example, here's how one might mark up a procedure definition:

     <p class="h-proc-def">
       <b>Procedure: </b>
       <code>
         <a class="p-name" name="make-array">make-array</a>
         <var class="h-arg">interval</var>
         <var class="h-arg">getter</var>
         [ <var class="h-arg"><span
class="p-type">optional</span>setter</var> ]
       </code>
     </p>

I tried the Python mf2py library, one of those ready-made parsers. All
I have to do is feed the above HTML code as a string to mf2py.parse().
It gives me a ready-to-use JSON structure with the information needed
for indexing. I didn't have to write a schema or a tree walker or use
any regexps or clean up whitespace. It takes care of everything. And
we can use jq or any other JSON parser to post-process the structure.

 > I rely on skribe syntax to document my project

Your Skribe source and HTML conversion look great! No wonder people
prefer this thing to LaTeX. Almost a pity we are stuck with HTML, but
what can you do :)

 > I might have skipped something but what would be the goal of those
classes?
 > Is it to allow to generate indices? or more complicated things
language servers
 > or even reference doc for things like https://devdocs.io?

Ideally all of those eventually (it will probably take years and
that's ok). Indexes would be the easiest and most immediately useful
thing to do, and would be doable with simple markup. Ciprian would
like to standardize the entire HTML structure of (finalized) SRFIs so
that it's easier to generate consistent documentation for doc browsers
like devdocs.io. That's a laudable goal if it can be done without
disturbing the normal course of the SRFI process. We're trying to
figure out what the final form and possible intermediate form of the
SRFI markup ought to look like, and how to get the best markup without
bothering SRFI authors (we'll probably write tools to minimize human
effort).

 > At the end of the day, my take on this is that the SRFI process
 > should keep a simple approach.

Fully agreed. We've been looking at different formats and tools in
this thread and it seems the consensus would have arrived at some form
of HTML with light metadata additions. That's reassuring because there
are tons of tools for HTML/XML and SRFIs are already HTML.

 > I'm still not convinced of the need to go all the way to XHTML.

 > If there's a reliable way to convert from HTML to XHTML, then
 > there's no need for XHTML to be the on-disk format.

XHTML is definitely not necessary for indexing, but it may be
necessary/much easier for the full-text conversions Ciprian would like
to do. I guess the decision rests on how easy it is to automate?

 > Even if there are volunteers, coordinating volunteers requires work,
 > and volunteers tend not to be as fast as one person at turning
 > things around.

Strongly agree from experience. The coordination, keeping track and
worrying about undone tasks and communication and schedules is often
the most thankless and invisible part of the work. I definitely
wouldn't want to increase the editor's workload there.

 > Part of the reason I believe that the SRFI process has been
 > successful recently, if I may say so, is that I've been turning new
 > drafts and updates to SRFIs around so quickly, usually within hours
 > and almost never taking more than a day or two.

Probably true. I've contributed code to some projects and a
low-friction, fast-turnaround environment is very motivating.

 > On the other hand, if we can still go through the existing SRFI
 > process without any extra work, then do the work of converting
 > formats and adding annotations afterwards, then only the indexes
 > will fall behind. That's probably a reasonable compromise.

 > I have recently sent a private message to each of the most recent
 > SRFI authors, asking them to participate in this discussion on
 > <srfi-discuss>. That includes Per.

Awesome :) You're doing a great job getting people to work together.

 > Concretely, what I would like to see is an example SRFI converted in
 > this way.

Definitely the way to go.

 > I continue to be impressed with pup. Per Bothner's SRFI 164 already
 > has a bunch of appropriate markup in it, so I've been experimenting
 > with it and pup. For example, here I extract the names of all the
 > procedures defined in that SRFI:

Ok, what an awesome tool! That pup and jq combination is gold.

Lassi