Re: Database connections as subprocesses

Show/hide message thread

Database connections as subprocesses Lassi Kortela (14 Sep 2019 07:30 UTC)
Re: Database connections as subprocesses John Cowan (15 Sep 2019 01:06 UTC)
Re: Database connections as subprocesses Lassi Kortela (15 Sep 2019 06:28 UTC)
Re: Database connections as subprocesses John Cowan (15 Sep 2019 23:02 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 08:22 UTC)
Binary S-expressions Lassi Kortela (16 Sep 2019 17:49 UTC)
(missing)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 09:46 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 11:33 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 12:05 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 12:23 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 13:20 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 13:48 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 15:52 UTC)
Re: Binary S-expressions hga@xxxxxx (17 Sep 2019 16:25 UTC)
Re: Binary S-expressions rain1@xxxxxx (17 Sep 2019 09:28 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 10:05 UTC)
Python library for binary S-expressions Lassi Kortela (17 Sep 2019 21:51 UTC)
R7RS library for binary S-expressions Lassi Kortela (17 Sep 2019 23:56 UTC)
Re: Database connections as subprocesses Alaric Snell-Pym (16 Sep 2019 08:40 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 09:22 UTC)
Re: Database connections as subprocesses Alaric Snell-Pym (16 Sep 2019 11:28 UTC)
Re: Database connections as subprocesses hga@xxxxxx (16 Sep 2019 13:28 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 13:50 UTC)
Re: Database connections as subprocesses hga@xxxxxx (17 Sep 2019 13:59 UTC)
Re: Database connections as subprocesses John Cowan (16 Sep 2019 22:41 UTC)
Re: Database connections as subprocesses Lassi Kortela (17 Sep 2019 09:57 UTC)
Re: Database connections as subprocesses Lassi Kortela (17 Sep 2019 10:22 UTC)

Re: Database connections as subprocesses Lassi Kortela 16 Sep 2019 08:22 UTC

>> Correct. And the driver lives in the subprocess written in C.
>
> That doubles the amount of traffic,

Yes. This is intended to be convenient to implementors of small Schemes,
who can plug into a subprocess with a standardized interface using the
IPC primitives they already have instead of writing new DB-specific FFI
stubs / C modules and linking them in.

Hopefully the solution will be convenient to users as well (just install
an extra program that's trivial to compile, and can hopefully be added
to OS package managers).

Another point is that if we do a good enough job with the subprocesses,
we can try to entice other language communities to adopt them too. Then
maintenance won't rest on our shoulders alone. This would really be the
ideal outcome: make something generic enough that any fledgling language
implementation that wants easy DB access can adopt the programs.

For situations where performance is more important than convenience,
you'd want to use a fast compile-to-C/native-code Scheme and a dedicated
library that speaks the Postgres/MySQL protocol and lives in the Scheme
process. It's a different scenario with opposite incentives.

> though presumably the subprocess will
> live on the same host as the application.

Yes. The point is that we need some component to speak the arcane DB
protocols, and in some situations it's easier or more reliable to put
that component in a subprocess instead of using the FFI or linking extra
C modules into the implementation itself.

I once jumped through hoops to install a binary-blob Oracle client for
work, so general distrust of DB client C libraries shows here :)
Quarantine them into their own subprocess unless in need of performance.

I plan to keep UpScheme's "no external library dependencies" promise,
but am exploring standardized subprocess interfaces for some features so
users can install extra C programs if they want extra features. DBs
would be one obvious scenario where this would make sense. Another would
be accessing remote file systems (SFTP, FTP, 9P, etc.)

Obviously it would be a big win if these subprocesses are not tied to
Scheme so other people can help maintain them and write new programs to
support more protocols. If there's ever a cottage industry of making

> Given that, I see no reason to
> write the subprocesses in C.  Let's use Python or something similar, which
> already has plenty of drivers, is much easier to program in correctly, and
> can handle a simple protocol straightforwardly.

That's an interesting idea. It might be expedient, at the cost of adding
a Python dependency and making the subprocess bigger (another
interpreter). The best thing is, since the pipe protocol is not
sensitive to implementation language, the drivers can be prototyped in
Python, or some drivers can be written in Python and others in C.

If writing in C, it might make sense to generate the IPC parsing/quoting
part with a Scheme script. Depends on how complex the IPC protocol ends
up. Best would be if this can be avoided altogether.

> While I'm at it, how about this extremely simple protocol:
>
> Request:
>
> 1. Bindings in the form name=value, one per line, where value can be a
> quoted string with R7RS \-escapes allowed (required, if the string contains
> a newline character), a number, a bytevector in R7RS notation, or
> (unquoted) null.  Blank lines are ignored.
>
> 2. A query as a triple-quoted string.  The first line starts with """,
> possibly with leading whitespace; the last line ends with """, possibly
> with trailing whitespace.  No escapes are allowed.
>
> 3. Flush.

> Response (in Lisp format):
>
> 1. A list of symbols wrapped in vertical bars, the column names.
>
> 2. Any number of lists whose elements are strings, numbers, bytevectors,
> and/or the symbol null.
>
> 3. The symbol "end".
>
> 4. Flush.
>
> Possible addition: some wrapper around ISO 8601 strings representing dates,
> times, timestamps.

Might I interest you in a simple "binary S-expressions" format that can
be converted into textual S-expressions and back? I've been wanting to
make one for a long time and this would be a perfect application :)

Using varints for all numerical quantities, and encoding all strings
with a varint length prefix, one gets trivial code with no endianness,
escaping or whitespace issues and no size limits. Embedding binary blobs
is trivial. A varint type tag can distinguish between different kinds of
objects. There are ways to wring more bit density out of all this stuff
but I would favor simplicity.

Float encoding requires some thought. Don't some databases have decimal
floats and currency too? Maybe floats should just be sent as digit strings.

Timestamps also need thought, as you say. There's a temptation to have a
simple Unix timestamp data type, but it may be too simple.

For this application, I'd make a very simple command language from the
start (if the command is an s-expression, just prepend a symbol saying
which command it is). There are so many databases that if we succeed,
someone will want to add extensions at some point.

Thanks for considering all this stuff.