Database connections as subprocesses
Lassi Kortela
(14 Sep 2019 07:30 UTC)
|
||
Re: Database connections as subprocesses
John Cowan
(15 Sep 2019 01:06 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(15 Sep 2019 06:28 UTC)
|
||
Re: Database connections as subprocesses
John Cowan
(15 Sep 2019 23:02 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(16 Sep 2019 08:22 UTC)
|
||
Binary S-expressions
Lassi Kortela
(16 Sep 2019 17:49 UTC)
|
||
(missing)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 09:46 UTC)
|
||
Re: Binary S-expressions
Alaric Snell-Pym
(17 Sep 2019 11:33 UTC)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 12:05 UTC)
|
||
Re: Binary S-expressions
Alaric Snell-Pym
(17 Sep 2019 12:23 UTC)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 13:20 UTC)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 13:48 UTC)
|
||
Re: Binary S-expressions
Alaric Snell-Pym
(17 Sep 2019 15:52 UTC)
|
||
Re: Binary S-expressions
hga@xxxxxx
(17 Sep 2019 16:25 UTC)
|
||
Re: Binary S-expressions
rain1@xxxxxx
(17 Sep 2019 09:28 UTC)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 10:05 UTC)
|
||
Python library for binary S-expressions
Lassi Kortela
(17 Sep 2019 21:51 UTC)
|
||
R7RS library for binary S-expressions
Lassi Kortela
(17 Sep 2019 23:56 UTC)
|
||
Re: Database connections as subprocesses Alaric Snell-Pym (16 Sep 2019 08:40 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(16 Sep 2019 09:22 UTC)
|
||
Re: Database connections as subprocesses
Alaric Snell-Pym
(16 Sep 2019 11:28 UTC)
|
||
Re: Database connections as subprocesses
hga@xxxxxx
(16 Sep 2019 13:28 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(16 Sep 2019 13:50 UTC)
|
||
Re: Database connections as subprocesses
hga@xxxxxx
(17 Sep 2019 13:59 UTC)
|
||
Re: Database connections as subprocesses
John Cowan
(16 Sep 2019 22:41 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(17 Sep 2019 09:57 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(17 Sep 2019 10:22 UTC)
|
On 16/09/2019 00:02, John Cowan wrote: > On Sun, Sep 15, 2019 at 2:28 AM Lassi Kortela <xxxxxx@lassi.io> wrote: > > >> Correct. And the driver lives in the subprocess written in C. >> > > That doubles the amount of traffic, though presumably the subprocess will > live on the same host as the application. Given that, I see no reason to > write the subprocesses in C. Let's use Python or something similar, which > already has plenty of drivers, is much easier to program in correctly, and > can handle a simple protocol straightforwardly. Well, if it's a protocol, then drivers can be written in anything - I suspect the point was that lots of databases have C client libraries, so writing a protocol driver around them in C would be the easiest option. But: I suspect this is a distraction. Linking to C libraries in-process versus writing a C wrapper that we talk to in a separate process is perhaps more about how an FFI to C is implemented, than inherent to the design of a database interface... I'd suggest we standardise a Schemely interface between database drivers and any intermediate infrastructure, then individual drivers could be: 1. SQLite-style in-process database engines in native Scheme (I can dream!) 2. Scheme implementations of wire protocols for databases such as MySQL, PSQL, etc 3. Scheme adapters to in-process FFIs for libpq et al 4. Scheme code to talk a standardised subprocess protocol to external driver binaries, written in whatever language ...(4) is a useful option, but I think one that needs to be defined separately from the Scheme-level API first, so that (1)-(3) can also be done. As a portable library, there will be value in having a separate-process driver for, eg, PostgreSQL that can be used in any scheme where subprocesses are a thing and the driver can be compiled (eg, on Linux), while schemes with a suitable FFI can choose to use a less-portable but more efficient binding to libpq, etc. Let a community of different drivers bloom! > While I'm at it, how about this extremely simple protocol: [snip] > Comments? I'm not sure how the wire protocols work for MySQL, PostgreSQL, etc. but I seem to recall that the library interfaces at least in principle allow you to do something like: rs1 = query("SELECT * FROM very_big_table"); rs2 = query("SELECT ... FROM t1, t2, t3, t4, t5 WHERE ...join conditions..."); repeat { r1 = rs1.read_row(); r2 = rs2.read_row(); ...some process involving r1 and r2... } until rs1.eof() || rs2.eof(); ...without consuming unbounded local storage or deadlocking. Eg, have two queries streaming results back at the same time over the same "connection". A few years back, I worked for a SQL database company; we had a wire protocol that our ODBC driver used to talk to the database servers, and the API structure was roughly like this: - StartQuery(query text) -> query ID - FetchResults(query ID) -> maybe(list(record)) - GetMetadata(query ID) -> list(column name/type declaration) - CancelQuery(query ID) ...plus a bunch of boring server/connection-level metadata get/set operations, and some other stuff for setting up prepared statements that I never looked at. But the key part was, you could issue a whole bunch of StartQuery operations then call FetchResults on the resulting query IDs with an arbitrary interleaving; each would return an arbitrarily-sized chunk of records, or Nothing if the query was finished and no more records would ever be returned. The ODBC client buffered those chunks as the application requested a record at a time, to amortize the network round trip overheads. (What that database server did at the backend is another, fascinating, story I summarised in a blog post: http://www.snell-pym.org.uk/archives/2016/12/11/cool-things-i-have-worked-on-clustered-analytic-database/ ) But: in general, I think a general database access protocol needs to support this, as some backends will be capable of doing that sort of query interleaving. -- Alaric Snell-Pym (M7KIT) http://www.snell-pym.org.uk/alaric/