Lassi Kortela <xxxxxx@lassi.io> writes: | What if we had a script that uses one of those popular tagsoup libraries (lenient HTML -> strict HTML) and then does post-processing to generate something close enough to the standardized (X)HTML structure we want? If the script worked well enough, that could be low-friction enough for SRFI authors. I checked with John Cowan, and he at least abandons his Markdown source once the conversion to HTML has been done. | Always nice to have more tools at our disposal! I just tried PUP on several SRFIs and was impressed. It should be possible to use that on the documents directly, perhaps with a few tweaks to PUP. I'm still not convinced of the need to go all the way to XHTML. | Like Ciprian, I would be in favor of XHTML eventually because it's easier to parse (XML libraries are absolutely everywhere and are more reliable than tagsoup). We may be able to use a tagsoup library to automatically convert HTML to XHTML. If there's a reliable way to convert from HTML to XHTML, then there's no need for XHTML to be the on-disk format. | In my mind, the big issue re: volunteers would be the huge backlog of existing SRFIs. But again there's no time pressure. If it takes 3 years to convert the whole backlog the result will still be useful. I agree. | Perhaps it would be best to first settle on the right format for new SRFIs. Then once we convert the old ones, we'd stand a chance to get the conversion right once and not have to go back and do a second round of fixups to them. I agree. I would love to see a few example SRFIs converted to the proposed format so that we can have a concrete discussion. | I would prefer to have the SRFI source documents in a standard format, and then have an API server built upon that foundation. The API could serve, index and query them in a number of different ways, whatever people need. The API server would be open source so hardcode users can also use it locally as a library. But I would strongly favor a "single point of truth" in the documents themselves instead of a separately maintained index (indexes auto-generated from the source documents are fine, in any format). This based on observing a kind of Murphy's Law that "Any information that can go out of sync, will go out of sync" :) I agree. The more I think about it, the more I agree that the ultimate source of truth should be the documents themselves. | I fully agree with these points. In particular, it'd be very nice if the final format doesn't require a tagsoup parser library, but can use a strict XML parser instead. IMHO it's fine if we use tagsoup to do the initial conversion from the author's HTML to the final format, but once an SRFI is in the final format, it would be nice to let tool writers use simple and reliable parsers. I will defer to those who do the work. | Arthur would be in the best position to estimate whether the workload is reasonable. Having a very involved SRFI editor would obviously produce the best documents, but volunteer work can also be pretty draining and thankless at times, and most people need breaks and lighter periods now and then to keep motivated. Being SRFI editor is already a lot of work, so I'm inclined not to increase the amount of work required. Even if there are volunteers, coordinating volunteers requires work, and volunteers tend not to be as fast as one person at turning things around. Part of the reason I believe that the SRFI process has been successful recently, if I may say so, is that I've been turning new drafts and updates to SRFIs around so quickly, usually within hours and almost never taking more than a day or two. It would be hard to do that with a coordinated effort. On the other hand, if we can still go through the existing SRFI process without any extra work, then do the work of converting formats and adding annotations afterwards, then only the indexes will fall behind. That's probably a reasonable compromise. | Should we have more beta testers though? At least Per Bothner | expressed some measure of interest in one of last year's threads. I have recently sent a private message to each of the most recent SRFI authors, asking them to participate in this discussion on <srfi-discuss>. That includes Per. | In my experience, quality of tools is strongly correlated with the simplicity of the format. People love to write lots of tools for simple formats because it's immediately rewarding. On the other hand, complex things tend to have poor tooling even with lots of industry backing. So IMHO the priority would be to make the format simple and familiar (hence similar to HTML). Not saying the IETF RFC XML is too complex but it seems quite verbose and divergent from HTML. I agree. | I would suggest roughly the following design approach: | | * Start with some version of XHTML | * Use only the XHTML tags we need in a rigid structure | * Add class attributes to signify everything else | * Specify a standard CSS stylesheet using those tags and classes That sounds good. Concretely, what I would like to see is an example SRFI converted in this way.