> In this age of giant services like those of Amazon, Google, and > Facebook, it's easy to forget that our machines and networks are > incredibly powerful and fast, and that many of our data sets are > microscopic in comparison. Brute force solutions are not only easy > to implement and practical, but they are often the most useful ones, > too. Fully agreed, but I would argue that those platform providers are the ultimate triumph of brute force. An unfathomably powerful machinery is invoked so we can do our little job in the easiest way possible: push some code or data. From the consumer's point of view, this is the least sophisticated way to run servers -- in a sense the McDonalds of servers -- and that's what's liberating about it. The platforms are not so much about speed as about convenience. The allure is skipping system administration. > I'm happy to keep using Linode to serve the generated files. I > already have automatic backups, TLS/SSL certificates (through Let's > Encrypt), Nginx, and a web-based control panel. Running the SRFI > site costs me nothing in addition to running the several other sites > I host as well. That's fine. I'm sure we can come up with a solution that will be easy enough to host on Linode too. The main issue is security. (I wouldn't be comfortable running a Scheme/Racket server with access to the full Linux file system, so in that respect I sympathize with your approach of preferring static files. And setting up Docker on your own server to containerize Racket is only fun for people who love sysadmining.) > Continuing with the theme of simplicity, the metadata for all SRFI > combined should only require a few hundred kilobytes, especially > when compressed. Given that, I argue that clients should fetch the > whole thing once and search it locally, perhaps fetching new copies > occasionally by checking HTTP headers. [...] This has the benefit of > eliminating the latency of fetching results from a server. It also > makes clients less dependent on the network to get the data, and it > eliminates our need to run a server at all beyond serving static > files. As far as I can tell, that this would eliminate our need to > run a SQL server, too. That's a great idea! The "official SRFI API" could just be a single tar file that contains all the HTML and S-expression files. You're already generating one with all the HTML, right? Compressed, it takes: * 1.3 MiB - gzip --best * 1.0 MiB - bzip2 --best * 0.9 MiB - xz --best This is a very manageable size for a download. It looks like fancier compression than bzip2 doesn't bring any real savings to the table. I'd vote for gzip since everything under the sun can decompress it. > Even adding metadata for all of R7RS and Snow > <http://snow-fort.org/> would not make such an approach impractical. I'd love to have metadata for those as well, but I think things are simplest to understand and manage if the SRFI process is self-contained and the metadata for RnRS and libraries are curated separately (even if some of the same people work on all of those collections). It would probably be simplest for historical continuity of the SRFI process over the years to have a very small footprint of responsibilities and infrastructure. The publishing rhythm and requirements of SRFI are also quite different to RnRS and libraries. FWIW and off topic, I began extracting the RnRS argument lists here: <https://github.com/lassik/scheme-rnrs-metadata>. It's incomplete but I think they can be auto-extracted to the same standard as the SRFIs. All RnRS documents used TeX with roughly the same semantic markup, so you RnRS editors have laid a good foundation. Eventually it's be cool to have that API aggregating all of these collections, but that's yet another project :) If we can establish social/technical processes in all the relevant communities to ensure good source data, then aggregation should be easy. > The more I think about the webhook, the more I think it is too > complex. The SRFI repos change rarely, and I always have to run > manual commands to change them, anyway. Running one more command > doesn't add meaningfully to my workload, and then we don't have to > maintain Github webhooks, etc. Eliminating the webhook eliminates > any new dependency on Github, too. I'd prefer to drop that part of > the system and just have a simple command that will extract data > from the SRFI documents, combine that with some data that is > manually edited, and produce the combined files. Then we can > concentrate on the core value that we're providing to our users. In their final form the HTML and metadata absolutely can be hosted from anywhere. The tricky thing is the editing phase, especially if volunteers send pull requests to amend the HTML of finalized SRFIs. This is a difficult problem for sure. Essentially: we have now almost solved the problem of metadata extraction, and the key question has shifted to data consistency. We know how to retrieve data from file systems, databases, HTTP, GitHub origin repos, Git clones on our computers -- basically anywhere -- and post-process it to make releases and serve them to the public in various ways. Those things will work out one way or another. The question is, how do we decide which data to retrieve? Every time we generate a release, say "srfi-all.tar.gz", we'd like to be sure that that release contains the latest valid versions of all SRFIs and metadata. Tools can check that it's valid, but how do we know that it's the latest stuff? First we have to decide which place is blessed as the "point of truth" where the latest sources are collected. Is it the GitHub origin repos or the Git clones on the SRFI editor's personal computer? The release-making tool will poll this place only. If it's GitHub, then 1) We need to install a webhook or a CI job (Travis CI, etc.) 2) PRs can be checked by the same webhook/CI as the master branch. This is great for volunteers who send PRs. 3) The webhook/CI might as well push a new release after every merge to the master branch. So the release is always up to date with no human effort. 4) Installing CI jobs into 160+ repos is difficult. So with this approach we'd have to use 'git subtree' to make a mega-repo containing all SRFIs as subdirectories. Then the CI job would run in the mega-repo. Volunteers would probably also send their PRs to this repo. 5) A CI job could generate the "srfi-all.tar.gz" file and the dashboard page as static files, then push them to a static file server via SFTP, Amazon S3 API, etc. Deployment would be simple. A webhook server could also do this, but it could also serve that content itself since it's already a web server. To me, it doesn't matter all that much which approach is chosen here. Both are fine. If it's the editor's personal computer, then 1) The editor should check that they have a clean working tree (have committed everything and pulled all the latest changes from GitHub) before making a release. 2) The editor has to make releases manually by running a script. To me, this raises the question of why not run that same script automatically via webhook/CI. 3) The editor has local Git clones of every SRFI so it's easy to get at all their files via the file system. This is a big plus for this approach. 4) On the other hand, it's still not much easier to check we have all the latest stuff before release (the script would have to poll all the GitHub origin repos). Of course, there's the alternative of not having an automatic tool to ensure we have the latest commits from all the SRFIs before release. But since the automated approach doesn't seem substantially difficult to me, I would favor it. No matter which of the above approaches we choose, a major hurdle is that the SRFIs are split into 160+ repos. All of the above would be quite simple if they were all in one mega-repo because it's simple to check for consistency and having the latest stuff (in GitHub, just set up an ordinary Travis CI job -- on personal computer, just do one "git pull" instead of a hundred). That being said, I see the benefits of having a separate repo for each SRFI. Particularly in the draft phase, so the author can clone only their own SRFI and not be bothered by updates to the other ones. It would seem that draft SRFIs and finalized SRFIs have strikingly different requirements for effective workflow. Because draft SRFIs are worked on individually, whereas finalized SRFIs are worked on in batches. I didn't realize this at all until now! I think this is the root cause of all the complexity in this hosting/release problem. I personally think the GitHub organization webhook is the only effective approach for ensuring consistency for massive amount of repos (160+). It's still not foolproof because the server may fail to respond to the webhook, which bugs me a little so it's not ideal. Would it be impossible this far into the process to change the Git conventions so that only draft SRFIs have their own repos under <https://github.com/scheme-requests-for-implementation/> and finalized/withdrawn SRFIs would be collected into one big repo? The metadata and markup work could then happen in the big repo. The key enabler here would be 'git subtree'. It allows each SRFI to be a subdirectory in the big repo. Each subdirectory then mirrors 1:1 the contents of that SRFIs individual repo (from which the draft came). If there's ever a need to update the individual repo with changes from the big repo, or vice versa, 'git subtree' allows copying commits in both directions surprisingly easily. (If you copy commits from the big repo to the small one, it will simply leave out all mention of files outside the subtree, and discard commits that didn't touch the subtree at all.) So 'git subtree' means that we don't ever have to make an inescapable commitment about how we lay out the repos. If we change our minds we can copy commits between repos. This is a lot to think about, but we could run experiments... The really nice thing about the big repo is, we could run the release tool in a bog standard free Travis CI job and Travis would give us pull request checks with no effort (if we use a web server with a webhook, implenting those PR checks takes the most effort). And tons of developers are familiar with that Travis/Jenkins-style CI workflow so there's nothing exotic about it. We also wouldn't have to deal with Git or any GitHub API stuff -- Travis makes a Git shallow-clone of the commit it needs, and then out tool just reads the local file system without worrying about Git or databases. This also means that the tool can run completely unchanged on a personal computer if we ever stop using a CI system. No need to bundle web server and API/db stuff with our tool. I'm beginning to warm up to the idea as well. The only question with the Travis CI approach would be how to upload the "srfi-all.tar.gz" and the dashboard web page to some static web server. Apparently Travis can upload files to any server via SFTP (<https://docs.travis-ci.com/user/deployment/custom/>) and to Amazon S3 (<https://docs.travis-ci.com/user/deployment/s3/>). Thoughts?