> A fourth possibility would be for the auto-generation process to merge > with the existing file rather than replace it. That needn't be > complicated, and allows for both manually curated and automatically > generated information. I'm sure that there will be other information > that can't be extracted automatically. > > Should the categories be visible in the SRFI document itself? > > I'm trying not to change the content of the SRFI documents themselves. In that case we would have to amend the SRFIs with some manual S-expression metadata anyway. (Or add those dreaded invisible HTML tags :p But I believe current consensus is to avoid that.) I would recommend having all metadata pertinent to a SRFI in that document's own repo. It will be easier to write tools if they can be made to work with a single repo only. Coupling between repos will also be reduced which is usually a good goal. (It's still on the table that the advanced type checking metadata would live in its own repo, but that's kind of a different-but-related project.) So we could have either: 1) One S-expression file that is first auto-generated with a tool and then edited by hand to add more stuff. After editing, the tool could be run to check that the auto-generated parts of the edited version still match what's in the HTML. 2) Two S-expression files (e.g. "manual.scm" and "final.scm" for lack of better names). The tool would take the HTML file and "manual.scm" and merge them into "final.scm"). It would touch neither the HTML file nor "manual.scm". Approach 2 might be cleaner, because the tool could be developed to extract more and more metadata (if Ciprian's standardized XHTML form is adopted, it could even be made to extract the full text paragraph by paragraph if somebody finds that useful). Then the S-expression file would change drastically with tool updates, making manual editing somewhat burdensome and confusing, even if we have the verification tool to help check our work.