make json-stream-read a fold-like operation?
Duy Nguyen
(20 Jan 2020 09:47 UTC)
|
Re: make json-stream-read a fold-like operation?
John Cowan
(20 Jan 2020 14:18 UTC)
|
Re: make json-stream-read a fold-like operation?
Amirouche Boubekki
(21 Jan 2020 10:41 UTC)
|
Re: make json-stream-read a fold-like operation?
Duy Nguyen
(21 Jan 2020 12:52 UTC)
|
Re: make json-stream-read a fold-like operation? Amirouche Boubekki (21 Jan 2020 15:39 UTC)
|
Re: make json-stream-read a fold-like operation?
Duy Nguyen
(23 Jan 2020 09:08 UTC)
|
json-fold, json-slice and json-transformer (Re: make json-stream-read a fold-like operation?)
Amirouche Boubekki
(23 Jan 2020 10:05 UTC)
|
wip json-fold (Re: json-fold, json-slice and json-transformer (Re: make json-stream-read a fold-like operation?))
Amirouche Boubekki
(23 Jan 2020 19:08 UTC)
|
Re: wip json-fold (Re: json-fold, json-slice and json-transformer (Re: make json-stream-read a fold-like operation?))
Duy Nguyen
(24 Jan 2020 01:40 UTC)
|
Le mar. 21 janv. 2020 à 13:52, Duy Nguyen <xxxxxx@gmail.com> a écrit : > > On Tue, Jan 21, 2020 at 5:40 PM Amirouche Boubekki > <xxxxxx@gmail.com> wrote: > > > > Le lun. 20 janv. 2020 à 15:18, John Cowan <xxxxxx@ccil.org> a écrit : > > > > > > LGTM > > > > > > On Mon, Jan 20, 2020 at 4:48 AM Duy Nguyen <xxxxxx@gmail.com> wrote: > > >> > > >> Is it possible to make 'proc' in json-stream-read take a third, opaque > > >> object, and pass proc's result to the next 'proc' call? > > >> json-stream-read returns the result of the last 'proc' call. > > >> > > >> I think by chaining these proc calls together, json-stream-read user > > >> can pass parsing state along and can even avoid mutable states if they > > >> want to. I haven't looked at the implementation though so I don't know > > >> how hard to do it. > > > > That occured to me but I am wondering whether it would not be better > > to make json-stream-read (possibly with a new name) return a > > generator. > > I haven't used generators a lot (at least not in Scheme) so I can't > really contribute anything here. With John's suggesting to go with > generators in other parts of the srfi, I guess we might as well do > generators here :) With json-stream-read returning a generator, what you are asking can be written as: (generator-fold PROC SEED (json-stream-read PORT)) ref: https://srfi.schemers.org/srfi-158/srfi-158.html I do no mean that it does not have its place in the specification. I do not know how to create a good fold-like procedure. GNU Guile has something in this spirit for XML, but I never figured how it works [0]. [0] https://www.gnu.org/software/guile/manual/html_node/SSAX.html Another thing we could poke at, is something like JSONSlicer [1]. It looks like the following: (json-read-slice selector port) -> generator Where SELECTOR is some kind of json selector (somewhat like CSS selectors). The generator would contain full Scheme objects like json-read does, but for the subset described by SELECTOR. It will help in the cases, where you want only some parts of a big JSON. That is the case of wikidata json dumps which is valid JSON, with a top level JSON array, every array item is written on single line ending with a command and newline. So, the way I used to parse it was to ignore the first line, a single open bracket (and ignore the last line!!), then repeatedly call read-line, ignore the comma and newline and parse what remains of the line as JSON text. It is only a problem because the file is JSON text instead of JSON lines and because the file is very big, several Gigas. In the case of wikidata json dump, the the procedure call would look something like: (json-read-slice '(*) port) Where * means every item of an array. If one wants only the english labels of all the concepts, it would look something like: (json-read-slice '(* labels english)) Otherwise, if one wants the label of the item indexed 42 in all languages: (json-read-slice '(42 labels)) What do you think about this slicer thing? I think it helps in the cases of bigger than memory JSON text, but I only know about wikidata use-case. jsonslicer is not very popular on github. I did not mention sxpath, the above SELECTOR argument looks like sxpath queries. [1] https://github.com/AMDmi3/jsonslicer