Re: GraphQL parser mostly working

Show/hide message thread
GraphQL parser mostly working Lassi Kortela (31 Jul 2019 09:55 UTC)
Re: GraphQL parser mostly working Peter Bex (31 Jul 2019 11:35 UTC)
Re: GraphQL parser mostly working John Cowan (31 Jul 2019 11:39 UTC)
Re: GraphQL parser mostly working Lassi Kortela (31 Jul 2019 12:05 UTC)
Re: GraphQL parser mostly working Lassi Kortela 31 Jul 2019 12:05 UTC
> Cool!  Just a quick note: that timing seemed to be a bit on the slow side
> so I checked and it seems you forgot to compile read-char-if into a
> library.  With ./test-read -:d you see these lines:
>
> ; loading ./read-char-if.import.scm ...
> ; including ./read-char-if.scm ...
>
> Running it in the profiler (add -:p to the invocation) also shows the
> majority of time being spent in <eval>, which is generally a bad sign :)
>
> With that eval, I get these timings here:
>
> ./test-read ../schema.public.graphql > /dev/null  2.78s user 0.05s system 70% cpu 4.012 total
>
> If I compile it into a .so file, I get better times:
>
> ./test-read ../schema.public.graphql > /dev/null  1.63s user 0.02s system 82% cpu 2.007 total
>
> Still far from great.  Most of the time is spent in string-index
> in srfi-13.
Awesome, thanks a lot for the help! I have no idea how to use the
Chicken compiler yet so I didn't know any of these things :)

string-index is only used to test whether a character belongs to an
ASCII span like A..Z. I changed those to test the character code using
`cond` and `<=` which shaved off 0.2 seconds from the original times.

That, and compiling libraries with:

$ csc -library read-char-if.import.scm
$ csc -library net.graphql.read.import.scm

brings the runtime down to 0.85 seconds. Here's the current profile:

(|##sys#write-char/port| 1 10.000000)
(|read-char-if.scm:20: loop| 1 10.000000)
(|graphql-read.scm:28: scheme#string->symbol| 1 10.000000)
(|graphql-read.scm:89: read-char-if#read-char?| 1 10.000000)
(|graphql-read.scm:40: read-char-if#read-char?| 1 10.000000)
(|##sys#read-char/port| 6 70.000000)
(|graphql-read.scm:39: scheme#string->symbol| 1 10.000000)
(|read-char-if.scm:2: k| 2 20.000000)
(|graphql-read.scm:28: scheme.base#char=?| 1 10.000000)
(|graphql-read.scm:210: packrat#make-result| 3 100.000000)
(|read-char-if.scm:14: chicken.base#open-output-string| 3 30.000000)
(|graphql-read.scm:28: scheme.base#string-map| 1 10.000000)
(|read-char-if.scm:9: match-char?| 11 110.000000)
(|test-read.scm:7: scheme#write| 1 40.000000)
(|graphql-read.scm:95: scheme#append| 1 10.000000)
(|<syntax>| 1 20.000000)
(|graphql-read.scm:210: g1285| 1 10.000000)
(|graphql-read.scm:105: scheme#append| 7 110.000000)
(|read-char-if.scm:9: scheme#peek-char| 16 170.000000)
(|read-char-if.scm:17: chicken.base#get-output-string| 3 30.000000)
(|graphql-read.scm:79: read-char-if#read-char?| 1 10.000000)
(|read-char-if.import.scm:2: chicken.load#load-extension| 1 10.000000)

John:
> You can merge the lexer into the packrat grammar: because of its infinite lookahead, there's no reason to have a separate lexer.

In the official GraphQL grammar the lexical syntax is indeed merged into
the parse grammar. I thought about doing that, but the lexer was already
done recursive-descent style and I wondered if the error messages are
going to be intelligible if the lexer is in the same grammar as the
parser. Of course, the lexer could be placed in its own packrat parser.
In case anyone wants to take a stab at this approach, be my guest :)

 From the profile, the time spent in 'append', 'get-output-string' and
'peek-char' suggest that lexing with packrat could easily make it
faster. Then again, I don't know whether packrat's internal data
structures are in fact faster than these. packrat#make-result takes up a
surprising amount of run time now. some of the 'string->symbol' calls
are spurious right now (single-char symbols; could just test the chars).

It is so nice to be back using a language where you can optimize only
the slow parts instead of being forced to write "optimized" code for
your entire program from the start (been writing a lot of C lately).

I can try the packrat lexer myself later if there's time, but the
current solution is fast enough for real work (GitHub's grammar is huge)
so I'd like to hook this up to Peter's Spiffy web server next.