Hi everyone,

Lassi Kortela <xxxxxx@lassi.io> schrieb am Mi. 31. Juli 2019 um 21:35:
Thank you for the excellent notes David!

> There are a couple of things to this that I would like to point out.
> First and foremost in terms of expectation management, I want to make
> clear that I'm happy to contribute with the knowledge
> I have gained in this realm in the form of conversations, giving
> guidance of making you aware of what to look out for and also on
> operational and design
> aspects of the engine. I can't currently contribute in terms of code,
> since that is a time commitment I can't make.
> I of course am also happy to look at PRs occasionally if my time permits.
> I hope that works for you.

That's perfectly fine. I'm a GraphQL noob so advice is very valuable
even without code. Thank you very much for being on board!

Cool. Thank you.



> There are a couple of things I'd like to understand about the concrete
> use-case.
> And I suggest to look at a concrete use-case first in order to have a
> minimal scope that you want to reach.
> Within the given scope there are a couple of things that can be
> considered and a conscious decision on what
> is part of the initial scope needs to be taken. The following if a non
> exhaustive list of aspects I know are important.
> I suggest to pick a small subset out of those possibilities here that
> can be incrementally enhanced.
>
> Schema
> =======
> Will the schema be defined in terms of S-Expressions only or will you
> allow to also provide it via GraphQL SDL?
> I suggest to start with the former and then add the latter at a later point.

Both GraphQL SDL and a round-trip-compatible S-expression equivalent.
The parser is now mostly finished (there are still a few bugs and
missing parts of the grammar, but it can successfully parse the GitHub
API's huge schema which I used as a benchmark).

That is very good progress. Great! I suggest to also have a look at 
https://github.com/graphql-cats/graphql-cats 
This is a good way to make sure the implementation, including the parser complies to the spec. It will become more relevant once the engine is able to resolve.


> Supported operations
> ================
> What kind of operations to you want to first focus on? I suggest to go
> with queries only.

This is a good idea. I got the impression that queries are mostly
orthogonal to mutations and subscriptions. Please correct if I'm wrong.

Yes to a degree. I mainly suggested that to give a priority for the implementation which should give a good deal of functionality without having to tackle everything.

For both mutations and subscriptions you need a working query execution scheme. These operations add things on top.


> It is very hard to get subscriptions right beyond very basic integrations.
> That doesn't mean that you can't think about the delivery scheme of
> values but you shouldn't implement everything right away since this is
> simply a lot of effort with unclear benefit.

Some kind of subscription mechanism would potentially very useful to
have in the Scheme API, but it's not clear that GraphQL subscriptions
(in their current form) are a good fit. We'd be more interested in
long-term subscriptions that bring infrequent news (similar to an RSS
feed). Is the GraphQL subscription mechanism more short-lived like long
polling?

Yes they are more short-lived in that regard. Common implementations involve long-polling or web-sockets.
RSS seems to be indeed better possibly also with simple change Schema information which can be extracted relatively straightforward.



> Execution strategies
> ================
> What kind of executions do you support? I suggest to start with the
> usual way of a client just sending the query via HTTP and Json.


> However any real world system I know has switched or is about to switch
> to a scheme were clients don't send the query but are able to refer to
> pre-existing queries.
> These are so called persisted queries. I would keep that in mind but
> don't care about it in the beginning.

That's very interesting. I read about them cursorily but don't really
understand them. Are these similar in spirit to stored procedures in SQL
databases?

Yes that analogy would work. The clients don’t send the query itself but rather refer to an existing query with an identifier. The server then takes care of finding and executing the correct query. These queries are either registered ahead of time or when they are first seen. Both schemes make sense depending on your need. 

The major advantages is that clients don’t have to send the query text over and over. Most queries are rather big and utilize variables to abstract over the varying parts. So clients really just need to send the query-id plus the variable values.
Additionally this allows to optimize execution since queries that are once validated do not need to go through all validations again.



> However you will have to think about more sophisticated schemes that
> allow you to fetch data more efficiently. Mark already talked about
> the prevention of N+1 but there are also other things you need to
> consider and introduce extension points to build on.
> Of course you need to think about which part of the resolution can be
> run concurrently / asynchronously.
> There are other things like being able to project selection sets to a
> resolver that is about to resolve an object type.

I'm completely out of my depth here. Any help is appreciated :)


Anyway, this is all just optimizations, right? You can start by doing
the same stuff in a slow brute-force manner. We won't have a large mass
of data and much traffic for a while so it's fine to advance slowly.

That is a good plan but it is not necessarily just optimization. A standard GQL implementation is expected to at least use parallel resolution where possible. Note that GQL is very well suited since it is clear which fields are allowed to be resolved in parallel.
All fields on the same level in a selection set can be resolved in parallel whereas hierarchies in the selections denote an explicit data dependency which means these fields will have to be resolved sequentially.
Or in FP speak an execution engine is a monad that sequences the resolution of each selection level. Resolving on the same level can be done within an applicative functor.


> Validation
> ========
> The GraphQL working group is moving in a direction where they, in order
> to keep the parser simpler, move detection of problems in documents
> (executable, or schema) into validations. There are some validations
> like making sure that overlapping fields can be merged that have very
> bad runtime complexity if you implement them as provided by the
> reference implementation. If you have public access to your GraphQL API
> you will have to think about that as part of your resilience measures,
> like introducing a deadline for requests and making sure work can be
> dropped without hogging resources in some pools.

Very interesting. Several Scheme implementations have internal thread
schedulers built into their runtimes. I don't know whether they support
resource limits easily. Racket has a "custodians" system that I think
does just that.

There's the possibility of launching a separate Unix process to handle
each request, with kernel-enforced resource limits. One could keep a
worker pool of subprocesses ready. Maybe use shared memory if it's not
too difficult. Lots of possibilities.

Yes there are many ways to do that. The main point is that it is that it needs to be considered because it might need to be integrated with the execution steps (validation, resolution, marshaling)


> Ahead of time analysis
> ==================
> You want to be able to make sure that a query isn't harmful, which means
> it's not too complex or too deeply nested.
> There are a couple of validations that need to run ahead of time in
> order to do that.
>
> Access pattern
> ===========
> How is the API going to be accessed? Is it going to be fully publicly
> available, i.e. there is no session or token or something like that
> to be used for access/request control. How many requests do you expect
> and what are the primary clients?

A fully public read-only API is the starting point for us. We could
require users to register for access token but that would lead to pretty
bad usability since there is no inherently user-specific data to serve.

It would be a better approach simply to have resource limits for each
query, so if it's too complex the server just kills it after a while.

Complexity is a good measure but it is probably not enough. A simple measure is to restrict the size in bytes on top of complexity and depth. Ultimately though you also want to be able to discard running queries if they take too long.
On an operational level you can also think about rate limiting.



> Observability
> ===========
> Think about extension points to hook in observability. This includes
> being able to collect custom metrics and potentially forward information
> in open tracing format via GraphQL response extensions.

This is optional to us but it would definitely be cool to get some graphs.

Yes it is, but the execution engine should emit metrics. If and how those metrics are collected is up to specific users. 


> Use case
> ========
> Can you please shed a bit of light on the envisioned first scope? What
> is the first thing it's going to be used for?
> Which what kind of data needs to be resolved and where does the data
> come from? Are we talking to a datastore?
> Are we talking to other services? Are we serving static data?

The thing that spurred me to work on this is the idea of a Scheme API
serving metadata and documentation pertaining to different RnRS/SRFI
documents, implementations and libraries. This API could be used as a
backend for writing many kinds of client applications to improve
Scheme's user experience. The initial idea was to make a web-based
documentation browser, but it soon became apparent that since we're
collecting all that data into a machine-processable format anyway, why
not make it available to anyone who wants to use it? We have a server at
<api.schemers.org> for this purpose but it's not live yet. There's a
staging version at <https://api.staging.scheme.fi/graphql> and you can
query a bit of real data we have scraped.

The data is static for the purposes of the GraphQL server. The current
plan is to have a cron-like orchestrator that checks whether web pages
or git repos have changed, and runs scrapers to parse new data whenever
they have. But this is infrequent enough that the server could even be
restarted after each data update and it wouldn't really affect the
experience.

Datastore hasn't been decided yet. We're just starting with static files
since it's simple to write S-expressions into them.

Ok great, bear in mind that you will have to be able to efficiently find the data during the course of the query. 
Plain s-exp files need probably some restructuring/indexing to v come efficient.


> Schema contribution
> ================
> Is the schema going to be contributed by other people? Will there be
> public contributions via PRs or is the schema fixed?

It's expected to evolve in a backward-compatible way. The people who
maintain the API server should also maintain the schema according to
this principle. PRs are fine but should be overseen by maintainers.

Ok good, then SDL Support would be great. S-expressions might work but are not as approachable for people that are new to scheme. Might not be a concern for the specific use case.

> Let me also point you to a couple of really good implementations, that
> you might want to draw inspiration from:
>
> * https://sangria-graphql.org/ (this is the library we build our service
> with)
> * https://github.com/walmartlabs/lacinia (this is interesting because
> this is already a lisp and I think their way to encode the schema and
> the execution is natural and neat)

Cool, thanks for the recommendations!

I already tried Lacinia but had a lot of different kinds of problems
with it. I will study it some more since you are vouching for it.

Do you have an opinion of Alumbra, the other Clojure GraphQL stack?

No I haven’t looked at that. We evaluated different stacks in various languages (JS, Ruby, Elixir, Scala) in more depth.


Hope that helps.
Cheers 
David