Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Arthur A. Gleckler (06 Mar 2019 03:03 UTC)
Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Marc Nieper-Wißkirchen (06 Mar 2019 10:12 UTC)

Re: Proposal to add HTML class attributes to SRFIs to aid machine-parsing Arthur A. Gleckler 06 Mar 2019 03:03 UTC

Lassi Kortela <xxxxxx@lassi.io> writes:

| What if we had a script that uses one of those popular tagsoup
libraries (lenient HTML -> strict HTML) and then does
post-processing to generate something close enough to the
standardized (X)HTML structure we want? If the script worked well
enough, that could be low-friction enough for SRFI authors.

I checked with John Cowan, and he at least abandons his Markdown
source once the conversion to HTML has been done.

| Always nice to have more tools at our disposal!

I just tried PUP on several SRFIs and was impressed.  It should be
possible to use that on the documents directly, perhaps with a few
tweaks to PUP.  I'm still not convinced of the need to go all the
way to XHTML.

| Like Ciprian, I would be in favor of XHTML eventually because
it's easier to parse (XML libraries are absolutely everywhere and
are more reliable than tagsoup). We may be able to use a tagsoup
library to automatically convert HTML to XHTML.

If there's a reliable way to convert from HTML to XHTML, then
there's no need for XHTML to be the on-disk format.

| In my mind, the big issue re: volunteers would be the huge
backlog of existing SRFIs. But again there's no time pressure. If
it takes 3 years to convert the whole backlog the result will
still be useful.

I agree.

| Perhaps it would be best to first settle on the right format for
new SRFIs. Then once we convert the old ones, we'd stand a chance
to get the conversion right once and not have to go back and do a
second round of fixups to them.

I agree.  I would love to see a few example SRFIs converted to the
proposed format so that we can have a concrete discussion.

| I would prefer to have the SRFI source documents in a standard
format, and then have an API server built upon that
foundation. The API could serve, index and query them in a number
of different ways, whatever people need. The API server would be
open source so hardcode users can also use it locally as a
library. But I would strongly favor a "single point of truth" in
the documents themselves instead of a separately maintained index
(indexes auto-generated from the source documents are fine, in any
format). This based on observing a kind of Murphy's Law that "Any
information that can go out of sync, will go out of sync" :)

I agree.  The more I think about it, the more I agree that the
ultimate source of truth should be the documents themselves.

| I fully agree with these points. In particular, it'd be very
nice if the final format doesn't require a tagsoup parser library,
but can use a strict XML parser instead. IMHO it's fine if we use
tagsoup to do the initial conversion from the author's HTML to the
final format, but once an SRFI is in the final format, it would be
nice to let tool writers use simple and reliable parsers.

I will defer to those who do the work.

| Arthur would be in the best position to estimate whether the
workload is reasonable. Having a very involved SRFI editor would
obviously produce the best documents, but volunteer work can also
be pretty draining and thankless at times, and most people need
breaks and lighter periods now and then to keep motivated.

Being SRFI editor is already a lot of work, so I'm inclined not to
increase the amount of work required.  Even if there are
volunteers, coordinating volunteers requires work, and volunteers
tend not to be as fast as one person at turning things around.
Part of the reason I believe that the SRFI process has been
successful recently, if I may say so, is that I've been turning
new drafts and updates to SRFIs around so quickly, usually within
hours and almost never taking more than a day or two.  It would be
hard to do that with a coordinated effort.

On the other hand, if we can still go through the existing SRFI
process without any extra work, then do the work of converting
formats and adding annotations afterwards, then only the indexes
will fall behind.  That's probably a reasonable compromise.

| Should we have more beta testers though? At least Per Bothner
| expressed some measure of interest in one of last year's

I have recently sent a private message to each of the most recent
SRFI authors, asking them to participate in this discussion on
<srfi-discuss>.  That includes Per.

| In my experience, quality of tools is strongly correlated with
the simplicity of the format. People love to write lots of tools
for simple formats because it's immediately rewarding. On the
other hand, complex things tend to have poor tooling even with
lots of industry backing. So IMHO the priority would be to make
the format simple and familiar (hence similar to HTML). Not saying
the IETF RFC XML is too complex but it seems quite verbose and
divergent from HTML.

I agree.

| I would suggest roughly the following design approach:
| * Start with some version of XHTML
| * Use only the XHTML tags we need in a rigid structure
| * Add class attributes to signify everything else
| * Specify a standard CSS stylesheet using those tags and classes

That sounds good.  Concretely, what I would like to see is an
example SRFI converted in this way.