Re: Grand unified schema for the metadata and API Lassi Kortela 12 Jul 2019 13:23 UTC
I got inspired and will make a (schemedoc ...) namespace and send some packages to Snow-Fort fairly soon. Initial plan: - (schemedoc trivial-http-client) -- Minimum viable HTTP client that works on many Scheme implementations. Just a procedure to do a HTTP GET that follows redirects and returns a bytevector or byte port of the response body. This is enough for most scrapers. If Schemeweb standardizes a HTTP client, we can switch to it later. - (schemedoc framework stub) -- Stub framework for local development. - (schemedoc framework full) -- Full framework for server use. - (schemedoc source implementation-metadata) -- Scrape our own implementation-metadata repo. - (schemedoc source srfi-data) -- Scrape admin/srfi-data.scm from Arthur's srfi-common repo. - (schemedoc source srfi-abstracts) -- Scrape SRFI abstracts (IIRC Ciprian had extracted these already; I'll make this library if I can find his data). If these are viable to get working and plugged into the GraphQL server, they set a template for more metadata sources to follow. This also adds extra motivation to advance the archive SRFI quickly, since lots of useful scraping is best done by downloading tar/zip archives. The current "API" for data sources is very simple: (provide-schemedoc-source (lambda (download) ...)) 'download' is a procedure: (download url cache-filename [hours]). It downloads a file from the URL into a cache directory. If the file is already in cache, it only re-downloaded if it's more than [hours] old (if hours is not given, use some default number of hours - maybe one week?) The lambda calls 'download' (possibly more than once) to fetch the files it needs. It returns a Scheme value that represents (part of) the giant S-expression. We get the final giant S-expression by merging the S-expressions from all the sources together.