Re: Grand unified schema for the metadata and API

Show/hide message thread

Grand unified schema for the metadata and API Lassi Kortela (12 Jul 2019 11:05 UTC)

Re: Grand unified schema for the metadata and API Lassi Kortela (12 Jul 2019 12:10 UTC)

Re: Grand unified schema for the metadata and API Lassi Kortela (12 Jul 2019 13:23 UTC)

Re: Grand unified schema for the metadata and API Lassi Kortela (12 Jul 2019 13:31 UTC)

Re: Grand unified schema for the metadata and API Arthur A. Gleckler (12 Jul 2019 20:32 UTC)

Re: Grand unified schema for the metadata and API Lassi Kortela 12 Jul 2019 13:23 UTC

I got inspired and will make a (schemedoc ...) namespace and send some
packages to Snow-Fort fairly soon. Initial plan:

- (schemedoc trivial-http-client) -- Minimum viable HTTP client that
works on many Scheme implementations. Just a procedure to do a HTTP GET
that follows redirects and returns a bytevector or byte port of the
response body. This is enough for most scrapers. If Schemeweb
standardizes a HTTP client, we can switch to it later.

- (schemedoc framework stub) -- Stub framework for local development.

- (schemedoc framework full) -- Full framework for server use.

- (schemedoc source implementation-metadata) -- Scrape our own
implementation-metadata repo.

- (schemedoc source srfi-data) -- Scrape admin/srfi-data.scm from
Arthur's srfi-common repo.

- (schemedoc source srfi-abstracts) -- Scrape SRFI abstracts (IIRC
Ciprian had extracted these already; I'll make this library if I can
find his data).

If these are viable to get working and plugged into the GraphQL server,
they set a template for more metadata sources to follow.

This also adds extra motivation to advance the archive SRFI quickly,
since lots of useful scraping is best done by downloading tar/zip archives.

The current "API" for data sources is very simple:

   (provide-schemedoc-source (lambda (download) ...))

'download' is a procedure: (download url cache-filename [hours]). It
downloads a file from the URL into a cache directory. If the file is
already in cache, it only re-downloaded if it's more than [hours] old
(if hours is not given, use some default number of hours - maybe one week?)

The lambda calls 'download' (possibly more than once) to fetch the files
it needs. It returns a Scheme value that represents (part of) the giant
S-expression. We get the final giant S-expression by merging the
S-expressions from all the sources together.