The original version of Spiffy had something like this as well.
It is very user-friendly and convenient, but it makes it quite difficult
to remove dispatchers.
Why would that be?
It also makes it impossible to run multiple servers in different threads in the same process, which handle different applications.
Shuttle is multi-threaded (per request), although I admit that I don't handle multiple domains directly right now, instead leaving that up to Nginx, which I use as a proxy server. It dispatches to a different TCP/IP port on localhost, which Shuttle then uses for disambiguation. But it would be trivial to add a host parameter to make-dispatcher so that it could dispatch first on the host, then on the rest.
The current version of Spiffy uses a vhost map (which turns out to be
a needless distraction) which is a ((host . handler) ...) alist in
a SRFI-39 / R7RS parameter.
The handler receives the "previous" handler (a sort of continuation),
which initially is just a thing that renders a minimal 404 page.
Then, when you add a new handler, it can inspect the path and query
parameters and so on to determine if this is handled by it, and if not,
it calls the continuation (which may be a chain of existing handlers
already).
What I like about this approach is that the request *dispatching* is
completely separated from the request/response *handling*. There are in
fact multiple libraries which implement dispatching. But in practice,
most people either do it by hand or use the uri-dispatch egg[1].
I like the idea of exposing that abstraction, just in case people need it, but it's too low-level for daily use, and encourages serialization of dispatching, which could become a performance problem.
When a new dispatcher is added, its path is added to a trie of dispatcher criteria, so that all the top-level path components are dispatched in a single operation, and then any remaining components can be disambiguated at the next level. This could, underneath, be translated into the next-handler approach you describe, but it's much more convenient for typical use. Not only that, but it makes the dispatch path first class, which facilitates things like a control panel that helps one visualize what the server's current routes are. That has proven fantastic for debugging.
After thinking about this, I think the vhost can also be part of the
handler instead. It can just look at the "host" header (or the host in
the URL, if it's absolute and we're in HTTP 1.0) and decide if it wants
to handle this particular host.
Yes, exactly.
I like the fact you can handle query parameters like this. It's probably
smart to add header dispatching too. That way, you can for example have
different handlers for different content types, or for different host
headers.
That would be really cool. With header dispatching, would we want some way to specify what takes priority, or would it be safe to say that one always dispatched on host first, then headers, then path, then query parameters? One could always fall back to the next-handler continuation approach for cases where that priority scheme was insufficient.
This record looks a lot like intarweb's request object. I think that
means we're on the right track :)
Cool.
> * Paths are matched by patterns that contain strings, which must match
> exactly, and variables, expressed as `(? variable)', which match anything
> between slashes.
In Intarweb, we parse URI paths as lists, so matching like this becomes
quite trivial; it is quite common to use Andrew Wright's "match" macro to
match path components, which is of course very similar to your path
matching syntax.
Yes, I use lists, too, but it would be nice to make the syntax match the well-understood and widely used Wright match macro.
In Spiffy, we use a "response" record type. It is quite similar to the
http-request record, but contains different fields like the response
status code, headers and the port to write to. The response object
is initially populated by the web server, and the user can manipulate
it to add headers etc.
That's a nice idea. I've never been entirely comfortable with basing that part on multiple values.
> output: a thunk that will write the output, or false if there is no
> output. Content-Length will be computed from this automatically.
That's a nice way to do it. Does it support streamed responses of
unknown length? In Spiffy, you have to manually set a few headers
and write the response object manually and then you can write the
content body. Depending on some magic, the response will automatically
be chunked. I really dislike how this works (I always forget how it
works, even though I came up with it), so perhaps we can come up with a
different way.
I never implemented HTTP/1.1, so I never implemented chunked replies. However, I was trying to anticipate it, so I think the API will still handle it. The idea is that the thunk can write for as long as it likes, and the server itself can read that port and convert the results into chunks, which it then delivers. But if the response is short enough, it can skip all that and just deliver the result directly, with just a little buffering. It leaves those decisions to the server, which seems okay.
I think streamed responses are important, because it makes web socket
support and "long polling" possible, as is the ability to send large
files (which do have a known length of course).
I agree.
Perhaps you can return either a string which will be returned directly,
or a "writer" procedure which receives a port which, when written to,
sends the payload wrapped in a "chunk" of a chunked encoding response.
Yes, that's a nice idea. I actually used to take either a string or a thunk, but found that using the thunk consistently was easier and seemed to have no drawbacks. But taking a procedure that takes a port might be better than taking a thunk. I just assume that the server sets the current output port when it calls the thunk.
I'm not sure this is the perfect solution, because you ideally want to
automatically support HEAD and Range requests too. And if possible,
the caching headers should be set automatically as well.
I'm assuming that you mean automatically supporting HEAD if no specific dispatcher is specified for it, just running the GET handler and truncating the result. Then you just don't run the writer, right? Since the headers are returned separately from the writer, they can still be delivered. Content-Length might not be known, but HEAD isn't required to deliver that, anyway, as far as I understand.
Does anyone successfully use Range requests? They always seemed impossible to fit nicely into any kind of general framework.
Using the Github repo wiki that Lassi set up, I've put together these notes on what we've been discussing:
Please feel free to add, change, or edit that page.
I propose that we continue this discussion off of srfi-discuss so as not to pester our other subscribers. (But they should have email clients with Mute buttons, right?) Let's include Lassi and Shiro and anyone else who asks to be added, too. Does that sound reasonable?