Email list hosting service & mailing list manager

(missing)
(missing)
(missing)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Marc Nieper-Wißkirchen (06 Mar 2019 10:12 UTC)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Ciprian Dorin Craciun (07 Mar 2019 21:06 UTC)

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Ciprian Dorin Craciun 07 Mar 2019 21:05 UTC

On Thu, Mar 7, 2019 at 10:46 PM Lassi Kortela <xxxxxx@lassi.io> wrote:
> This is great, but how would you integrate this into the current SRFI
> process? The change in HTML structure would be very large. Would you
> mandate this for new SRFIs, have editors convert submitted SRFIs to this
> format by hand, or try to write an automated conversion tool?

As hinted in a previous email, I think we should approach these SRFI
documents as an actual "magazine", in which the authors submit their
"papers", then when a final version is reached, the "publisher" takes
over and re-formats the actual text.  Moreover once this process is
over, no re-doing is needed, as the actual SRFI documents are
read-only.

Unfortunately (after looking at SRFI-1) I don't think this
restructuring can be done automatically...  For example I've found
`<code>` inside `<code>` or `<pre>`, multiple identifiers contained in
the same `<code>` element, invalid closed elements, etc.  Moreover the
current HTML structure is mainly focused on presentation, thus some
`<code>` elements contain `&nbsp;` for tabulation, etc.

(I don't want to criticize the authors / editors, an unfortunately
HTML is hard to edit correctly.  I just state the current status.)

Therefore, given that there aren't many "final" SRFI's (only 128), I
think such a process, although lengthy, would yield the best quality
documents.

> > Because I still maintain my position that trying to extract more than
> > basic metadata about the procedures described within, I'll simplify
> > and remove extra classes or elements (if they exist).
>
> I'm not sure which metadata you consider basic.
>
> The tool I wrote relies only on plain text (in S-expression syntax) to
> extract the names of arguments and return values, including optional /
> rest arguments. It doesn't rely on HTML tags or classes for anything at
> all. So whatever HTML structure you end up with, as long as you can pull
> the plain text of a definition, its arguments can be extracted.

As stated in previous emails, although I agree that "something" is
better than "nothing", and given the fact that you've managed to pull
this is extraordinary, in the end the extracted information is not
"complete" nor "reliable"...

My "end-goal" (but long-term) is to have for Scheme something similar
to Erlang's `dyalizer`:  https://learnyousomeerlang.com/dialyzer

(And for such a goal, "unreliable" signatures are almost useless...)

Ciprian.