Re: json-stream-read should validate json too Amirouche Boubekki 21 Jan 2020 13:46 UTC

Le mar. 21 janv. 2020 à 13:44, Duy Nguyen <xxxxxx@gmail.com> a écrit :
>
> On Tue, Jan 21, 2020 at 5:47 PM Amirouche Boubekki
> <xxxxxx@gmail.com> wrote:
> > > Alternatively maybe we can wrap user-provided 'proc' in our own proc
> > > that does validation on top, something like a stripped down version of
> > > %json-read that does nothing but validate? For example,
> > > make-json-validator takes a proc and returns a new proc also performs
> > > validation.
> >
> > I will look at it, it seems to me if one can validate inside
> > json-stream-read, it will be more useful.
>
> Yes it's definitely more useful inside json-stream-read to me. I was
> just worried some people value performance and may be ok with no
> validation (e.g. you have verified it at some point before). I don't
> know if such a use case exist though.

My goal is to have as conformant as possible reader and writer
implementation, and a specification that describes a library that
covers most uses, this includes:

- parsing JSON text: covered by json-read procedure

- parsing bigger than memory JSON text: hence the streaming parser:
json-stream-read

- make it possible to adapt JSON types to custom Scheme types e.g.
using records: streaming parser is a solution, but it is not the
easiest to use: we could imagine another procedure that makes it
easier to customize the output?

- Do not crash in case of bad input like deeply nested JSON.

And possibly:

- printing bigger than memory JSON text

- parsing json lines

If one drops the "conformant" reader from the implementation, it is
possible to make it faster e.g. one can consider that once `t` is
read, that the following letters are `rue` that is `true. Unlike at
the moment, there is an explicit test that checks that the parser does
error in case `txyz` [0].  That is a small nip. Another small nip, is
the use of scheme regexp to validate number instead of passing them
directly to string->number [1]. So, there is room to improve
performance, if one want to read JSON very fast.

Note: the specification does not prescribe conformant reader, it use
the term "should".

[0] https://github.com/scheme-requests-for-implementation/srfi-180/blob/master/srfi/json-checks.sld#L467-L474
[1] https://github.com/scheme-requests-for-implementation/srfi-180/blob/master/srfi/json.scm#L18

>
> > Also, I was thinking about adding a parameters like
> > `json-maximum-nesting-level` that would be 501 by default.  And that
> > will control the reader, in case there is 501 or more nested JSON
> > array or object, json-stream-reader will raise a json-error?  What do
> > you think?
>
> Do we really have any problem with nesting level though? I think the
> streaming code itself does not, and the way 'proc' is currently
> implement, we don't call it recursively either. This reminds me of a
> hacker news thread [1]. Anyway, because it's quite easy to count depth
> from user code (and if 'proc' composes well), and (I assume) we don't
> have any limits regarding nesting level, I think it's best leave it
> out.
> [1] https://news.ycombinator.com/item?id=21483256

Thanks for the link.

I have not proof as of yet, but I think it will be faster to parse
JSON text without streaming, but to stay safe, it must have nesting
level limit. So, maybe there is a place for a `json-read-fast`
procedure?

> --
> Duy

--
Amirouche ~ https://hyper.dev