On Tue, Mar 26, 2019 at 7:27 AM Lassi Kortela <xxxxxx@lassi.io> wrote:
 
That's a great idea! The "official SRFI API" could just be a single
tar file that contains all the HTML and S-expression files. You're
already generating one with all the HTML, right? Compressed, it takes:

Yes, and we could provide an even smaller variant without the HTML, depending on the client.  (For example, a client doing only code completion might only need signatures.)
   
Eventually it's be cool to have that API aggregating all of these
collections, but that's yet another project :) If we can establish
social/technical processes in all the relevant communities to ensure
good source data, then aggregation should be easy.

Agreed.  We should just be careful not to do anything that precludes later aggregation.  For example, if a common standard for indexing becomes available, we should use that.
  
Every time we generate a release, say "srfi-all.tar.gz", we'd like to
be sure that that release contains the latest valid versions of all
SRFIs and metadata. Tools can check that it's valid, but how do we
know that it's the latest stuff?

By construction.  We make sure that the tool runs over all the Git repos, which constitute the "database of record."  I haven't seen any plan to use any other source even though, as you say, we have access to HTTP, databases, etc.

First we have to decide which place is blessed as the "point of truth"
where the latest sources are collected. Is it the GitHub origin repos
or the Git clones on the SRFI editor's personal computer? The
release-making tool will poll this place only.

Is this really a distinction worth making?  Keeping Git repos in sync is trivial — that's one of Git's major benefits.  If I run the metadata-extraction tool locally, then commit the results and push them, we can be confident that the Github copies are identical.
 
If it's the editor's personal computer, then

1) The editor should check that they have a clean working tree (have
    committed everything and pulled all the latest changes from GitHub)
    before making a release.

Easy:

for SRFI in {0..165}; do cd $ss/srfi-$SRFI/; git pull; done
 
2) The editor has to make releases manually by running a script. To
    me, this raises the question of why not run that same script
    automatically via webhook/CI.

Because running a command is trivial and doesn't introduce dependencies on other tools.
 
3) The editor has local Git clones of every SRFI so it's easy to get
    at all their files via the file system. This is a big plus for this
    approach.
4) On the other hand, it's still not much easier to check we have all
    the latest stuff before release (the script would have to poll all
    the GitHub origin repos).

See #1.  I do this kind of thing all the time.
  
No matter which of the above approaches we choose, a major hurdle is
that the SRFIs are split into 160+ repos. All of the above would be
quite simple if they were all in one mega-repo because it's simple to
check for consistency and having the latest stuff (in GitHub, just set
up an ordinary Travis CI job -- on personal computer, just do one "git
pull" instead of a hundred).

It's still simple with multiple repos.

That being said, I see the benefits of having a separate repo for each
SRFI. Particularly in the draft phase, so the author can clone only
their own SRFI and not be bothered by updates to the other ones.

That's still a benefit after finalization.  We still publish errata, fixes to sample implementations, etc.
 
It would seem that draft SRFIs and finalized SRFIs have strikingly
different requirements for effective workflow. Because draft SRFIs are
worked on individually, whereas finalized SRFIs are worked on in
batches. I didn't realize this at all until now! I think this is the
root cause of all the complexity in this hosting/release problem.

I don't see the difference in workflow at all.  Even working in batches, when I've done it, hasn't been impeded by having separate repos.
 
I personally think the GitHub organization webhook is the only
effective approach for ensuring consistency for massive amount of
repos (160+). It's still not foolproof because the server may fail to
respond to the webhook, which bugs me a little so it's not ideal.

Would it be impossible this far into the process to change the Git
conventions so that only draft SRFIs have their own repos under
<https://github.com/scheme-requests-for-implementation/> and
finalized/withdrawn SRFIs would be collected into one big repo? The
metadata and markup work could then happen in the big repo

No, I just don't see the advantages.  Since there's manual work involved, we're still extracting metadata one SRFI at a time.  And keeping a consistent version control log across the entire history of each SRFI, without a break at finalization, is important.  There's no need to make things more complex.

I don't see anything about Travis that would prevent us from using it with separate repos, either.  For loops are our friends.  But I would really like us to concentrate on extraction of metadata rather than building and setting up infrastructure to solve more elaborate problems that we're not likely to have, anyway.  There just aren't that many SRFIs.

Can you tell me what, specifically, Travis could automatically check for us?  Why couldn't our metadata-extraction tool run that same check?

Let's keep it as simple as possible.  If we encounter persistent problems, then maybe we can make a tradeoff in favor of slightly more complexity.