Database connections as subprocesses Lassi Kortela (14 Sep 2019 07:30 UTC)
Re: Database connections as subprocesses John Cowan (15 Sep 2019 01:06 UTC)
Re: Database connections as subprocesses Lassi Kortela (15 Sep 2019 06:28 UTC)
Re: Database connections as subprocesses John Cowan (15 Sep 2019 23:02 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 08:22 UTC)
Binary S-expressions Lassi Kortela (16 Sep 2019 17:49 UTC)
(missing)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 09:46 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 11:33 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 12:05 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 12:23 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 13:20 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 13:48 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 15:52 UTC)
Re: Binary S-expressions hga@xxxxxx (17 Sep 2019 16:25 UTC)
Re: Binary S-expressions rain1@xxxxxx (17 Sep 2019 09:28 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 10:05 UTC)
Python library for binary S-expressions Lassi Kortela (17 Sep 2019 21:51 UTC)
R7RS library for binary S-expressions Lassi Kortela (17 Sep 2019 23:56 UTC)
Re: Database connections as subprocesses Alaric Snell-Pym (16 Sep 2019 08:40 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 09:22 UTC)
Re: Database connections as subprocesses Alaric Snell-Pym (16 Sep 2019 11:28 UTC)
Re: Database connections as subprocesses hga@xxxxxx (16 Sep 2019 13:28 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 13:50 UTC)
Re: Database connections as subprocesses hga@xxxxxx (17 Sep 2019 13:59 UTC)
Re: Database connections as subprocesses John Cowan (16 Sep 2019 22:41 UTC)
Re: Database connections as subprocesses Lassi Kortela (17 Sep 2019 09:57 UTC)
Re: Database connections as subprocesses Lassi Kortela (17 Sep 2019 10:22 UTC)

Re: Database connections as subprocesses Alaric Snell-Pym 16 Sep 2019 08:40 UTC
On 16/09/2019 00:02, John Cowan wrote:
> On Sun, Sep 15, 2019 at 2:28 AM Lassi Kortela <xxxxxx@lassi.io> wrote:
>
>
>> Correct. And the driver lives in the subprocess written in C.
>>
>
> That doubles the amount of traffic, though presumably the subprocess will
> live on the same host as the application.  Given that, I see no reason to
> write the subprocesses in C.  Let's use Python or something similar, which
> already has plenty of drivers, is much easier to program in correctly, and
> can handle a simple protocol straightforwardly.

Well, if it's a protocol, then drivers can be written in anything - I
suspect the point was that lots of databases have C client libraries, so
writing a protocol driver around them in C would be the easiest option.

But: I suspect this is a distraction. Linking to C libraries in-process
versus writing a C wrapper that we talk to in a separate process is
perhaps more about how an FFI to C is implemented, than inherent to the
design of a database interface... I'd suggest we standardise a Schemely
interface between database drivers and any intermediate infrastructure,
then individual drivers could be:

1. SQLite-style in-process database engines in native Scheme (I can dream!)
2. Scheme implementations of wire protocols for databases such as MySQL,
PSQL, etc
3. Scheme adapters to in-process FFIs for libpq et al
4. Scheme code to talk a standardised subprocess protocol to external
driver binaries, written in whatever language

...(4) is a useful option, but I think one that needs to be defined
separately from the Scheme-level API first, so that (1)-(3) can also be
done. As a portable library, there will be value in having a
separate-process driver for, eg, PostgreSQL that can be used in any
scheme where subprocesses are a thing and the driver can be compiled
(eg, on Linux), while schemes with a suitable FFI can choose to use a
less-portable but more efficient binding to libpq, etc. Let a community
of different drivers bloom!

> While I'm at it, how about this extremely simple protocol:
[snip]
> Comments?

I'm not sure how the wire protocols work for MySQL, PostgreSQL, etc. but
I seem to recall that the library interfaces at least in principle allow
you to do something like:

rs1 = query("SELECT * FROM very_big_table");
rs2 = query("SELECT ... FROM t1, t2, t3, t4, t5 WHERE ...join
conditions...");

repeat {
   r1 = rs1.read_row();
   r2 = rs2.read_row();
   ...some process involving r1 and r2...
} until rs1.eof() || rs2.eof();

...without consuming unbounded local storage or deadlocking. Eg, have
two queries streaming results back at the same time over the same
"connection".

A few years back, I worked for a SQL database company; we had a wire
protocol that our ODBC driver used to talk to the database servers, and
the API structure was roughly like this:

- StartQuery(query text) -> query ID
- FetchResults(query ID) -> maybe(list(record))
- GetMetadata(query ID) -> list(column name/type declaration)
- CancelQuery(query ID)

...plus a bunch of boring server/connection-level metadata get/set
operations, and some other stuff for setting up prepared statements that
I never looked at.

But the key part was, you could issue a whole bunch of StartQuery
operations then call FetchResults on the resulting query IDs with an
arbitrary interleaving; each would return an arbitrarily-sized chunk of
records, or Nothing if the query was finished and no more records would
ever be returned. The ODBC client buffered those chunks as the
application requested a record at a time, to amortize the network round
trip overheads.

(What that database server did at the backend is another, fascinating,
story I summarised in a blog post:
http://www.snell-pym.org.uk/archives/2016/12/11/cool-things-i-have-worked-on-clustered-analytic-database/
)

But: in general, I think a general database access protocol needs to
support this, as some backends will be capable of doing that sort of
query interleaving.

--
Alaric Snell-Pym   (M7KIT)
http://www.snell-pym.org.uk/alaric/