Scraping could be made perfectly reliable by adding HTML class
attributes to the SRFI source HTML. I emailed Arthur Gleckler about
this and his initial response was enthusiastic but he suggested that
we have a wider discussion on this mailing list. He also pointed me to
earlier threads started by Ciprian Craciun on this list in 2018:
Yes, thank you for publishing your ideas on srfi-discuss.
I'm interested in the idea of adding HTML classes as annotations, too. It has been proposed before, and the only big drawback is that it will be a lot of work to to add the annotations to our 166 prior SRFIs, or even the finalized subset of them. But the work will yield a way to produce a comprehensive index automatically, and that's worth a lot.
* Alternatively we could embed HTML comments in the SRFIs and parse
them, but I would argue that parsing comments is a classic pitfall
best avoided where possible. We could also use a separate XML
namespace for metadata but since we're currently using HTML instead
of XHTML it will be easier to go with attributes on the HTML tags
themselves. Even if we somehow managed to mandate XHTML, the
multi-namespace solution would still end up more complex than using
HTML classes.
I'm not a fan of the HTML comment approach. That's more intrusive, and requires software that can parse HTML and preserve comments in the process.
One requirement I have for any proposal is that the effort required by an author to comply is minimal. Part of the reason that the SRFI process has been successful for twenty years is that the editors have kept "friction" low. Adding a few classes and <span>s shouldn't be much effort.
Note that some authors write their documents in another markup language, then convert them to HTML. This proposal would require that they update their software to produce the new classes, or edit the generated HTML afterwards. That shouldn't stop us from trying this proposal, but it's something to keep in mind.
Thanks to you and to Ciprian and others for proposing and working on these ideas. Having programmatically accessible metadata could be a boon, not just for indexing, but also for automatic cross-linking between standards documents and other purposes as well.