Re: Grand unified schema for the metadata and API
Lassi Kortela 12 Jul 2019 13:23 UTC
I got inspired and will make a (schemedoc ...) namespace and send some
packages to Snow-Fort fairly soon. Initial plan:
- (schemedoc trivial-http-client) -- Minimum viable HTTP client that
works on many Scheme implementations. Just a procedure to do a HTTP GET
that follows redirects and returns a bytevector or byte port of the
response body. This is enough for most scrapers. If Schemeweb
standardizes a HTTP client, we can switch to it later.
- (schemedoc framework stub) -- Stub framework for local development.
- (schemedoc framework full) -- Full framework for server use.
- (schemedoc source implementation-metadata) -- Scrape our own
implementation-metadata repo.
- (schemedoc source srfi-data) -- Scrape admin/srfi-data.scm from
Arthur's srfi-common repo.
- (schemedoc source srfi-abstracts) -- Scrape SRFI abstracts (IIRC
Ciprian had extracted these already; I'll make this library if I can
find his data).
If these are viable to get working and plugged into the GraphQL server,
they set a template for more metadata sources to follow.
This also adds extra motivation to advance the archive SRFI quickly,
since lots of useful scraping is best done by downloading tar/zip archives.
The current "API" for data sources is very simple:
(provide-schemedoc-source (lambda (download) ...))
'download' is a procedure: (download url cache-filename [hours]). It
downloads a file from the URL into a cache directory. If the file is
already in cache, it only re-downloaded if it's more than [hours] old
(if hours is not given, use some default number of hours - maybe one week?)
The lambda calls 'download' (possibly more than once) to fetch the files
it needs. It returns a Scheme value that represents (part of) the giant
S-expression. We get the final giant S-expression by merging the
S-expressions from all the sources together.