> This is excellent work! Thanks! It was a happy accident, worked far better than I expected. > It would be great to include HTML IDs in the template, too. That would > make it possible for other tools to link directly to the reference > material in the SRFI document. Of course, the IDs would need be > extracted by your tool, too. I agree. Direct links would be really nice. I wonder if the HTTP URL or HTML specs place any limitations on text that can appear in the IDs. Scheme symbols use many weird characters like ? ! < > = + * / > This seems like a great way to bootstrap the whole process. I just timed editing two SRFIs from scratch using the tool. SRFI 41 took 12 minutes and SRFI 72 (a large one) took 20 minutes. If 3 people each edited on average 2 SRFIs per day, we'd have the entire back catalogue done in a month! > But if the author doesn't encode metadata, we can create the same > external file manually. When you get into the groove you can edit and check one ordinary procedure definition in about 10 seconds. So it might be both faster and less error-prone to just mark up the source HTML and use automatic conversion instead of doing it by hand (copy/paste from web browser). While editing I ran the tool after each finished definition to see its conversion (and some earlier ones for context). This way, errors are easy to catch. The command was: ./tool.py srfi-41.html && tail -n 30 srfi-41.lisp > If we put examples of this new approach in the SRFI template, then > authors who wish to can, with little effort, encode metadata in > their documents. > Editors and volunteers can then use your tool to extract signatures > into a standard format like Ciprian's. > As far as I can tell, we've moved from the prospect of doing more > work to the prospect of doing less work. I would still like to have a little more work done ;) By either the authors or the editors/volunteers. In particular: * Would be nice if somebody adds that HTML metadata to new SRFIs. As demonstrated, this shouldn't take more than half an hour even for the most complex SRFIs. * Though requirements on HTML are lenient, what would really help is to mandate that all tags have closing tags. Unbalanced or missing tags have really been the only detriment to machine-parsing that I've encountered. There are "lenient linters" that check only this, e.g. https://www.jwz.org/hacks/validate.pl * I still think it would be best to generate all of the S-expression stuff from the HTML, now that probably the hardest thing (arg lists) has proven this easy. So we could add classes for the abstract, author, date, status, license, and other general information like that. I guess it would be best to store the S-expression files in the same git repo where we have the SRFI itself. If/when changes are made to the HTML, we can re-run the tool to generate a new S-expression file. It can then compare the old and new files to see if there are suspicious-looking differences between them (e.g. some definitions have changed or gone missing in the new version of the SRFI). This would ensure that we always have metadata and the HTML stays in a parseable state even after edits. If an author re-generates the HTML using their own personal tool and it loses our metadata classes, an editor could use the saved S-expression file to put the classes back into the new HTML, using the tool's comparison feature to check what's missing.