Re: Database connections as subprocesses

Show/hide message thread

Database connections as subprocesses Lassi Kortela (14 Sep 2019 07:30 UTC)
Re: Database connections as subprocesses John Cowan (15 Sep 2019 01:06 UTC)
Re: Database connections as subprocesses Lassi Kortela (15 Sep 2019 06:28 UTC)
Re: Database connections as subprocesses John Cowan (15 Sep 2019 23:02 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 08:22 UTC)
Binary S-expressions Lassi Kortela (16 Sep 2019 17:49 UTC)
(missing)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 09:46 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 11:33 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 12:05 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 12:23 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 13:20 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 13:48 UTC)
Re: Binary S-expressions Alaric Snell-Pym (17 Sep 2019 15:52 UTC)
Re: Binary S-expressions hga@xxxxxx (17 Sep 2019 16:25 UTC)
Re: Binary S-expressions rain1@xxxxxx (17 Sep 2019 09:28 UTC)
Re: Binary S-expressions Lassi Kortela (17 Sep 2019 10:05 UTC)
Python library for binary S-expressions Lassi Kortela (17 Sep 2019 21:51 UTC)
R7RS library for binary S-expressions Lassi Kortela (17 Sep 2019 23:56 UTC)
Re: Database connections as subprocesses Alaric Snell-Pym (16 Sep 2019 08:40 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 09:22 UTC)
Re: Database connections as subprocesses Alaric Snell-Pym (16 Sep 2019 11:28 UTC)
Re: Database connections as subprocesses hga@xxxxxx (16 Sep 2019 13:28 UTC)
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 13:50 UTC)
Re: Database connections as subprocesses hga@xxxxxx (17 Sep 2019 13:59 UTC)
Re: Database connections as subprocesses John Cowan (16 Sep 2019 22:41 UTC)
Re: Database connections as subprocesses Lassi Kortela (17 Sep 2019 09:57 UTC)
Re: Database connections as subprocesses Lassi Kortela (17 Sep 2019 10:22 UTC)

Re: Database connections as subprocesses Alaric Snell-Pym 16 Sep 2019 11:27 UTC

Show/hide attachments

On 16/09/2019 10:22, Lassi Kortela wrote:

[snip]
> This is exactly what I intended with the proposal :)
Ah, good!

> That's very cool, but probably also highly non-trivial to implement
> reliably.

Oh, I dunno; you can easily implement something like:

>> - StartQuery(query text) -> query ID
>> - FetchResults(query ID) -> maybe(list(record))
>> - GetMetadata(query ID) -> list(column name/type declaration)
>> - CancelQuery(query ID)

...on top of something like
https://www.postgresql.org/docs/9.5/libpq-exec.html or
https://sqlite.org/cintro.html or whatever, that have the notion of a
"result handle" returned by a function that accepts some SQL, and can be
used to fetch rows from it until there's no more. It's just a matter of
reflecting the structure of the API in the protocol :-)

>> But the key part was, you could issue a whole bunch of StartQuery
>> operations then call FetchResults on the resulting query IDs with an
>> arbitrary interleaving; each would return an arbitrarily-sized chunk of
>> records, or Nothing if the query was finished and no more records would
>> ever be returned. The ODBC client buffered those chunks as the
>> application requested a record at a time, to amortize the network round
>> trip overheads.
>
> In principle, this would likely be relatively easy to do with a
> subprocess (compared to threads and FFI). But since parallel queries are
> for high-performance situations, people might want to go with the
> pure-Scheme clients anyway. This is just speculation; since you've
> actually worked with this stuff in a production setting, feel free to
> offer counterpoints.

In our application, the only performance issues that anybody really
cared about was how fast one query could stream results back to the
client - and how quickly the first result gets back (often, clients
would do a query that would return a bazillion rows, then cancel the
query after they'd received ten) - so the "return results in pages"
thing arose out of the requirement to support that! Being able to
interleave multiple queries on one connection was mainly womething that
database APIs tend to let users do anyway (with the whole
query-function-returns-a-result-set thing) so we'd better support that
without crashing or deadlocks.

>> (What that database server did at the backend is another, fascinating,
>> story I summarised in a blog post:
>> http://www.snell-pym.org.uk/archives/2016/12/11/cool-things-i-have-worked-on-clustered-analytic-database/
>>
>> )
>
> That is some seriously impressive database wizardry.

That was the last job I had where I actually got to use the word
"algorithm" in earnest (the fun I had elbow-deep in the guts of the join
order planner!), and I miss it :-(

Thanks,

--
Alaric Snell-Pym   (M7KIT)
http://www.snell-pym.org.uk/alaric/