Finally, the solution could be split into two web servers that share the
same database -- one server to listen to the GitHub webhook and write to
the database, and the other server would just have read-only access to
the database and serve the contents over the web. But in practice I
think this is just more complex to code and deploy, and more
resource-intensive on the server, with no real-world gains in security.
The more I think about the webhook, the more I think it is too complex. The SRFI repos change rarely, and I always have to run manual commands to change them, anyway. Running one more command doesn't add meaningfully to my workload, and then we don't have to maintain Github webhooks, etc. Eliminating the webhook eliminates any new dependency on Github, too. I'd prefer to drop that part of the system and just have a simple command that will extract data from the SRFI documents, combine that with some data that is manually edited, and produce the combined files. Then we can concentrate on the core value that we're providing to our users.
The nice thing is, everything in the database is a cached version of
something that can be derived automatically from the SRFI origin repos.
So if some security breach or software/hardward fault erases the
database, it can be rebuilt from scratch by traversing all the GitHub repos.
Continuing with the theme of simplicity, the metadata for all SRFI combined should only require a few hundred kilobytes, especially when compressed. Given that, I argue that clients should fetch the whole thing once and search it locally, perhaps fetching new copies occasionally by checking HTTP headers. Even adding metadata for all of R7RS and
Snow would not make such an approach impractical. This has the benefit of eliminating the latency of fetching results from a server. It also makes clients less dependent on the network to get the data, and it eliminates our need to run a server at all beyond serving static files. As far as I can tell, that this would eliminate our need to run a SQL server, too.
In this age of giant services like those of Amazon, Google, and Facebook, it's easy to forget that our machines and networks are incredibly powerful and fast, and that many of our data sets are microscopic in comparison. Brute force solutions are not only easy to implement and practical, but they are often the most useful ones, too.
I'm happy to keep using Linode to serve the generated files. I already have automatic backups, TLS/SSL certificates (through Let's Encrypt), Nginx, and a web-based control panel. Running the SRFI site costs me nothing in addition to running the several other sites I host as well.
If you help me produce the best possible metadata for the SRFIs, I will take care of hosting it in a way that is low overhead for everyone.