Re: GraphQL client in Chicken: next steps

Show/hide message thread
Re: GraphQL client in Chicken: next steps Lassi Kortela (31 Jul 2019 19:34 UTC)
Re: GraphQL client in Chicken: next steps David Krentzlin (10 Aug 2019 15:38 UTC)
Re: GraphQL client in Chicken: next steps Lassi Kortela 31 Jul 2019 19:34 UTC
Thank you for the excellent notes David!

> There are a couple of things to this that I would like to point out.
> First and foremost in terms of expectation management, I want to make
> clear that I'm happy to contribute with the knowledge
> I have gained in this realm in the form of conversations, giving
> guidance of making you aware of what to look out for and also on
> operational and design
> aspects of the engine. I can't currently contribute in terms of code,
> since that is a time commitment I can't make.
> I of course am also happy to look at PRs occasionally if my time permits.
> I hope that works for you.

That's perfectly fine. I'm a GraphQL noob so advice is very valuable
even without code. Thank you very much for being on board!

> There are a couple of things I'd like to understand about the concrete
> use-case.
> And I suggest to look at a concrete use-case first in order to have a
> minimal scope that you want to reach.
> Within the given scope there are a couple of things that can be
> considered and a conscious decision on what
> is part of the initial scope needs to be taken. The following if a non
> exhaustive list of aspects I know are important.
> I suggest to pick a small subset out of those possibilities here that
> can be incrementally enhanced.
>
> Schema
> =======
> Will the schema be defined in terms of S-Expressions only or will you
> allow to also provide it via GraphQL SDL?
> I suggest to start with the former and then add the latter at a later point.

Both GraphQL SDL and a round-trip-compatible S-expression equivalent.
The parser is now mostly finished (there are still a few bugs and
missing parts of the grammar, but it can successfully parse the GitHub
API's huge schema which I used as a benchmark).

Next step is to design a good S-expression representation. I collected
all the prior art (Clojure, Emacs Lisp, CL) and will try to talk the
people who wrote those libraries into collaborating with me on a common
syntax.

> Supported operations
> ================
> What kind of operations to you want to first focus on? I suggest to go
> with queries only.

This is a good idea. I got the impression that queries are mostly
orthogonal to mutations and subscriptions. Please correct if I'm wrong.

> Then as a second step add mutations and only if you need it add
> subscriptions.

Good plan.

> It is very hard to get subscriptions right beyond very basic integrations.
> That doesn't mean that you can't think about the delivery scheme of
> values but you shouldn't implement everything right away since this is
> simply a lot of effort with unclear benefit.

Some kind of subscription mechanism would potentially very useful to
have in the Scheme API, but it's not clear that GraphQL subscriptions
(in their current form) are a good fit. We'd be more interested in
long-term subscriptions that bring infrequent news (similar to an RSS
feed). Is the GraphQL subscription mechanism more short-lived like long
polling?

Maybe we should literally have RSS/Atom feeds for the API.

Of course, if someone needs GraphQL subscriptions for some other Scheme
project, let's include them.

> Execution strategies
> ================
> What kind of executions do you support? I suggest to start with the
> usual way of a client just sending the query via HTTP and Json.

Agreed.

Once we get the S-expression syntax done, we'd also like to support
S-expressions over HTTP. Semantics and performance ought to be much the
same as for JSON over HTTP.

> However any real world system I know has switched or is about to switch
> to a scheme were clients don't send the query but are able to refer to
> pre-existing queries.
> These are so called persisted queries. I would keep that in mind but
> don't care about it in the beginning.

That's very interesting. I read about them cursorily but don't really
understand them. Are these similar in spirit to stored procedures in SQL
databases?

> Resolution strategies
> =================
> At its simplest a GraphQL query engine is just a tree traversal, where
> the engine calls a successor function on each field.
> These functions are called resolvers. So you could think about a schema
> that has a reference to a scheme procedure attached to the fields.
> This is a good first step and I suggest to start there.

Agreed. This is similar to the basic usage of the Node.js graphql library.

> However you will have to think about more sophisticated schemes that
> allow you to fetch data more efficiently. Mark already talked about
> the prevention of N+1 but there are also other things you need to
> consider and introduce extension points to build on.
> Of course you need to think about which part of the resolution can be
> run concurrently / asynchronously.
> There are other things like being able to project selection sets to a
> resolver that is about to resolve an object type.

I'm completely out of my depth here. Any help is appreciated :)

Anyway, this is all just optimizations, right? You can start by doing
the same stuff in a slow brute-force manner. We won't have a large mass
of data and much traffic for a while so it's fine to advance slowly.

> Validation
> ========
> The GraphQL working group is moving in a direction where they, in order
> to keep the parser simpler, move detection of problems in documents
> (executable, or schema) into validations. There are some validations
> like making sure that overlapping fields can be merged that have very
> bad runtime complexity if you implement them as provided by the
> reference implementation. If you have public access to your GraphQL API
> you will have to think about that as part of your resilience measures,
> like introducing a deadline for requests and making sure work can be
> dropped without hogging resources in some pools.

Very interesting. Several Scheme implementations have internal thread
schedulers built into their runtimes. I don't know whether they support
resource limits easily. Racket has a "custodians" system that I think
does just that.

There's the possibility of launching a separate Unix process to handle
each request, with kernel-enforced resource limits. One could keep a
worker pool of subprocesses ready. Maybe use shared memory if it's not
too difficult. Lots of possibilities.

Do you know if any of the Clojure GraphQL libraries already deal with
this stuff somehow?

> Ahead of time analysis
> ==================
> You want to be able to make sure that a query isn't harmful, which means
> it's not too complex or too deeply nested.
> There are a couple of validations that need to run ahead of time in
> order to do that.
>
> Access pattern
> ===========
> How is the API going to be accessed? Is it going to be fully publicly
> available, i.e. there is no session or token or something like that
> to be used for access/request control. How many requests do you expect
> and what are the primary clients?

A fully public read-only API is the starting point for us. We could
require users to register for access token but that would lead to pretty
bad usability since there is no inherently user-specific data to serve.

It would be a better approach simply to have resource limits for each
query, so if it's too complex the server just kills it after a while.

> Observability
> ===========
> Think about extension points to hook in observability. This includes
> being able to collect custom metrics and potentially forward information
> in open tracing format via GraphQL response extensions.

This is optional to us but it would definitely be cool to get some graphs.

> Use case
> ========
> Can you please shed a bit of light on the envisioned first scope? What
> is the first thing it's going to be used for?
> Which what kind of data needs to be resolved and where does the data
> come from? Are we talking to a datastore?
> Are we talking to other services? Are we serving static data?

The thing that spurred me to work on this is the idea of a Scheme API
serving metadata and documentation pertaining to different RnRS/SRFI
documents, implementations and libraries. This API could be used as a
backend for writing many kinds of client applications to improve
Scheme's user experience. The initial idea was to make a web-based
documentation browser, but it soon became apparent that since we're
collecting all that data into a machine-processable format anyway, why
not make it available to anyone who wants to use it? We have a server at
<api.schemers.org> for this purpose but it's not live yet. There's a
staging version at <https://api.staging.scheme.fi/graphql> and you can
query a bit of real data we have scraped.

The data is static for the purposes of the GraphQL server. The current
plan is to have a cron-like orchestrator that checks whether web pages
or git repos have changed, and runs scrapers to parse new data whenever
they have. But this is infrequent enough that the server could even be
restarted after each data update and it wouldn't really affect the
experience.

Datastore hasn't been decided yet. We're just starting with static files
since it's simple to write S-expressions into them.

> Schema contribution
> ================
> Is the schema going to be contributed by other people? Will there be
> public contributions via PRs or is the schema fixed?

It's expected to evolve in a backward-compatible way. The people who
maintain the API server should also maintain the schema according to
this principle. PRs are fine but should be overseen by maintainers.

> Let me also point you to a couple of really good implementations, that
> you might want to draw inspiration from:
>
> * https://sangria-graphql.org/ (this is the library we build our service
> with)
> * https://github.com/walmartlabs/lacinia (this is interesting because
> this is already a lisp and I think their way to encode the schema and
> the execution is natural and neat)

Cool, thanks for the recommendations!

I already tried Lacinia but had a lot of different kinds of problems
with it. I will study it some more since you are vouching for it.

Do you have an opinion of Alumbra, the other Clojure GraphQL stack?

> I hope that helps as a starting point and I suggest to take the time to
> set the goals straight before jumping to actions.
> Let me close this that I'm glad to see how much energy is put in here
> and that your take on the challenge to bring this to schema. Kudos!

Thanks a lot for the wisdom and perspective! It's much needed :)