Re: SRFI-metadata-syncing SRFI?

Show/hide message thread

SRFI-metadata-syncing SRFI? noosphere@xxxxxx (08 Nov 2020 21:50 UTC)
Re: SRFI-metadata-syncing SRFI? Vladimir Nikishkin (09 Nov 2020 01:00 UTC)
Re: SRFI-metadata-syncing SRFI? Lassi Kortela (09 Nov 2020 09:41 UTC)
Re: SRFI-metadata-syncing SRFI? noosphere@xxxxxx (09 Nov 2020 16:15 UTC)
Re: SRFI-metadata-syncing SRFI? Lassi Kortela (09 Nov 2020 16:36 UTC)
(missing)
Re: SRFI-metadata-syncing SRFI? noosphere@xxxxxx (09 Nov 2020 20:35 UTC)
Re: SRFI-metadata-syncing SRFI? Lassi Kortela (09 Nov 2020 20:57 UTC)
Re: SRFI-metadata-syncing SRFI? Lassi Kortela (09 Nov 2020 21:05 UTC)
Re: SRFI-metadata-syncing SRFI? noosphere@xxxxxx (09 Nov 2020 23:41 UTC)
Re: SRFI-metadata-syncing SRFI? Lassi Kortela (10 Nov 2020 07:53 UTC)
Re: SRFI-metadata-syncing SRFI? noosphere@xxxxxx (09 Nov 2020 23:45 UTC)
Re: SRFI-metadata-syncing SRFI? noosphere@xxxxxx (09 Nov 2020 20:50 UTC)
Re: SRFI-metadata-syncing SRFI? Lassi Kortela (09 Nov 2020 21:12 UTC)
The funnel pattern Lassi Kortela (09 Nov 2020 21:30 UTC)

Re: SRFI-metadata-syncing SRFI? noosphere@xxxxxx 09 Nov 2020 20:49 UTC

On Mon 09 Nov 2020 06:36:08 PM +02, Lassi Kortela wrote:
>
>> While workable, this seems to me to be less than ideal because any time
>> one scrapes something the process is fragile, needing manual intervention
>> to fix the scraper whenever some unforeseen change happens to the
>> structure of what's being scraped.
>
> The formats don't change all that much, and the big benefit of scrapers
> is that they are executable documentation.

Metadata consumers and processors would be similar executable documentation.

Also, for there are benefits for spelling the metadata format in
a standard, as opposed to relying on a particular scraper implementation
to be the standard.

> Before we had the current scrapers, there were various hand-written
> listings of SRFI support around the net. It was impossible to tell where
> they came from, how they had been assembled, and which parts were up to
> date. There was no way for a newcomer to replicate the results.
>
> Any large-scale data aggregation effort should absolutely use scrapers,
> if only for documentation purposes. But it's also a good way to avoid
> human error.

Human error would still be possible in coding a scraper that scrapes
unstructured data.  I don't know the details of how they're coded, but
it's conceivable they could miss some part of the scheme implementation
which says which srfi's that implementation supports, especially when
things change over the years.

Such a mistake will be much harder to make when one is consuming
well documented and standardized metadata, that was explicitly designed
to give you that information.

It is still possible for humans to make a mistake in populating the
metadata in the first place, however.  So I see your point.

I guess this is a tradeoff one has to decide on between errors by
humans in populating and maintaining their scheme's metadata vs
errors in scrapers as the contents of tar files change.

>> Because there is no standard, the data you get from a an arbitrary
>> Scheme's tar file is going to be unstructured, requiring more custom
>> rules to extract it.
>>
>> Wouldn't it be so much simpler if every Scheme published the desired
>> data in the desired format at could be directly, reliably consumed
>> without having to write any custom code to deal with unstructured data
>> in random locations?
>
> It would, and this is most easily accomplished by adding an S-expression
> file to each Scheme's git repo.

That makes sense to me.  As long as it's structured data, explicitly
intended to furnish the desired metadata.

> GitHub and GitLab can make you a link to each raw file stored in any Git
> repo. E.g.
> <https://raw.githubusercontent.com/schemedoc/implementation-metadata/master/schemes/chicken.scm>.
> You can also change "master" in the URL to a different branch or tag. If
> the aggregator could look for a standard file in the repo, it wouldn't
> have to download the whole repo, and would take less than 1 second.

Great.

> The source metadata would be in each package. Each package manager would
> scan all of its own packages and compile an index file. The aggregator
> that compiles the full SRFI support table for all implementations would
> then aggregate _that_ data :) We should serve the full table as HTML,
> JSON, and S-expressions so people can save time and easily
> machine-extract things directly from the full aggregated data.

Perfect.

> Don't worry about commitments; we can make a repo under
> <https://github.com/pre-srfi> and gradually work on it.

Sounds good.

  --Sergey