Re: make json-stream-read a fold-like operation? Duy Nguyen 23 Jan 2020 09:07 UTC

On Tue, Jan 21, 2020 at 10:38 PM Amirouche Boubekki
<> wrote:
> Another thing we could poke at, is something like JSONSlicer [1]. It
> looks like the following:
>   (json-read-slice selector port) -> generator
> Where SELECTOR is some kind of json selector (somewhat like CSS
> selectors). The generator would contain full Scheme objects like
> json-read does, but for the subset described by SELECTOR.  It will
> help in the cases, where you want only some parts of a big JSON.
> That is the case of wikidata json dumps which is valid JSON, with a
> top level JSON array, every array item is written on single line
> ending with a command and newline.  So, the way I used to parse it was
> to ignore the first line, a single open bracket (and ignore the last
> line!!), then repeatedly call read-line, ignore the comma and newline
> and parse what remains of the line as JSON text.  It is only a problem
> because the file is JSON text instead of JSON lines and because the
> file is very big, several Gigas.
> In the case of wikidata json dump, the the procedure call would look
> something like:
>   (json-read-slice '(*) port)
> Where * means every item of an array.  If one wants only the english
> labels of all the concepts, it would look something like:
>  (json-read-slice '(* labels english))
> Otherwise, if one wants the label of the item indexed 42 in all languages:
>   (json-read-slice '(42 labels))
> What do you think about this slicer thing?

Yeah it looks a lot like sxpath which should be in its own thing. And
personally I'd rather have that separate from JSON. We have a JSON
stream reader, we have a slicer that can handle any stream parser.
Then we plug them together. People can replace the JSON stream reader
with something else (even sxml) and it should still work.

Actually maybe we just rip the sxpath out of sxml. The underlying
engine (with sexp pattern, not xpath syntax) is really cool.

> I think it helps in the
> cases of bigger than memory JSON text, but I only know about wikidata
> use-case. jsonslicer is not very popular on github.  I did not mention
> sxpath, the above SELECTOR argument looks like sxpath queries.
> [1]