On Mon, Mar 25, 2019 at 7:56 PM Lassi Kortela <xxxxxx@lassi.io> wrote:

> I use a Linode instance for srfi.schemers.org
> <http://srfi.schemers.org>. It's a virtual Linux instance. I use MIT
> Scheme for my own work, but I'd be happy to try Racket for this
> purpose. We could probably run it all there.
>
> Another possibility would be to convert everything to statically served
> files so that Nginx could work its speed magic on them. Since SRFIs
> change rarely, that would be practical. It also reduces the attack
> surface and maintenance burden of the server.

The reason I picked Racket is to have as much stuff as possible work out
of the box (this is quite a big volunteer effort so the faster to
develop the better). For example, GitHub's API uses HMAC authentication
with SHA-1 hashing and Racket ships with ready hmac and sha1 procedures.

Static files sound simpler than a SQL database at first glance but
somewhat surprisingly they don't turn out to be. The big thing is,
databases guarantee consistency with essentially no effort and we can't
accidentally write outside the bounds of our allotted file system space
if we fail to sanitize our filenames, etc. That being said, there are no
special requirements for the database -- an ordinary SQLite db would do
just fine. I didn't yet use SQLite because Heroku only has Postgres.

If speed is an issue, the server should probably cache all the file
contents in RAM instead of using disk files (even if the files in RAM
were compressed it'd still likely be faster than disk). However, I'd
guess it won't be much of an issue. Racket/PostgreSQL do everything
instantaneously on a free Heroku vhost no matter how sloppy my code is.
Linode is probably as fast or faster since you are paying for it.

The attack surface is really the big question. Racket comes with so much
convenience stuff that I wouldn't run it in the same file system
namespace with other stuff (then again, I wouldn't run any dynamic
language). So if we host it on Linode maybe we should make that Docker
container after all to achieve the file system isolation. It could use
SQLite so you don't need to configure PostgreSQL which is heavy for such
a simple app.

If you're good at configuring servers (I'm lousy at it...) there's
almost certainly also a way to set up Nginx proxying so that it caches
the stuff from Racket. I can add HTTP last-modified and cache headers in
Racket, it can probably use those to infer when to serve cached content.
The Nginx HTTP implementation is almost certainly more robust against
HTTP vulnerabilities than Racket's (though it shouldn't be a big issue
if Racket is isolated into its own container).

Finally, the solution could be split into two web servers that share the
same database -- one server to listen to the GitHub webhook and write to
the database, and the other server would just have read-only access to
the database and serve the contents over the web. But in practice I
think this is just more complex to code and deploy, and more
resource-intensive on the server, with no real-world gains in security.

The nice thing is, everything in the database is a cached version of
something that can be derived automatically from the SRFI origin repos.
So if some security breach or software/hardward fault erases the
database, it can be rebuilt from scratch by traversing all the GitHub repos.