Database connections as subprocesses
Lassi Kortela
(14 Sep 2019 07:30 UTC)
|
||
Re: Database connections as subprocesses
John Cowan
(15 Sep 2019 01:06 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(15 Sep 2019 06:28 UTC)
|
||
Re: Database connections as subprocesses
John Cowan
(15 Sep 2019 23:02 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(16 Sep 2019 08:22 UTC)
|
||
Binary S-expressions
Lassi Kortela
(16 Sep 2019 17:49 UTC)
|
||
(missing)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 09:46 UTC)
|
||
Re: Binary S-expressions
Alaric Snell-Pym
(17 Sep 2019 11:33 UTC)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 12:05 UTC)
|
||
Re: Binary S-expressions
Alaric Snell-Pym
(17 Sep 2019 12:23 UTC)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 13:20 UTC)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 13:48 UTC)
|
||
Re: Binary S-expressions
Alaric Snell-Pym
(17 Sep 2019 15:52 UTC)
|
||
Re: Binary S-expressions
hga@xxxxxx
(17 Sep 2019 16:25 UTC)
|
||
Re: Binary S-expressions
rain1@xxxxxx
(17 Sep 2019 09:28 UTC)
|
||
Re: Binary S-expressions
Lassi Kortela
(17 Sep 2019 10:05 UTC)
|
||
Python library for binary S-expressions
Lassi Kortela
(17 Sep 2019 21:51 UTC)
|
||
R7RS library for binary S-expressions
Lassi Kortela
(17 Sep 2019 23:56 UTC)
|
||
Re: Database connections as subprocesses
Alaric Snell-Pym
(16 Sep 2019 08:40 UTC)
|
||
Re: Database connections as subprocesses Lassi Kortela (16 Sep 2019 09:22 UTC)
|
||
Re: Database connections as subprocesses
Alaric Snell-Pym
(16 Sep 2019 11:28 UTC)
|
||
Re: Database connections as subprocesses
hga@xxxxxx
(16 Sep 2019 13:28 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(16 Sep 2019 13:50 UTC)
|
||
Re: Database connections as subprocesses
hga@xxxxxx
(17 Sep 2019 13:59 UTC)
|
||
Re: Database connections as subprocesses
John Cowan
(16 Sep 2019 22:41 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(17 Sep 2019 09:57 UTC)
|
||
Re: Database connections as subprocesses
Lassi Kortela
(17 Sep 2019 10:22 UTC)
|
> Well, if it's a protocol, then drivers can be written in anything - I > suspect the point was that lots of databases have C client libraries, so > writing a protocol driver around them in C would be the easiest option. Exactly :) > But: I suspect this is a distraction. Linking to C libraries in-process > versus writing a C wrapper that we talk to in a separate process is > perhaps more about how an FFI to C is implemented, than inherent to the > design of a database interface... Indeed, this approach is more for situations where an implementation doesn't have a FFI or you don't want to use it. I propose it as a complementary approach to FFI wrappers, not as a replacement. To the Scheme application programmer, it would be best if any subprocess approach offers an identical (or nearly identical) interface to an FFI-based approach. It would just be an implementation detail. Ideally the Scheme interface would be specified so that it supports both implementation styles (or mixing and matching them for different databases, depending on what happens to be available or expedient on the host system). We could start implementations using both approaches early to make sure the spec takes all scenarios into account. > I'd suggest we standardise a Schemely > interface between database drivers and any intermediate infrastructure, > then individual drivers could be: > > 1. SQLite-style in-process database engines in native Scheme (I can dream!) There are some SQLite-style DBs in pure Go, but last I checked they don't support SQLite files as backing stores. Not sure if they even support SQL. I assume other languages are further behind (possibly Rust is now Go's equal given all the excitement around it). SQLite is very hard to replicate, and its test suite is legendary. > 2. Scheme implementations of wire protocols for databases such as MySQL, > PSQL, etc > 3. Scheme adapters to in-process FFIs for libpq et al > 4. Scheme code to talk a standardised subprocess protocol to external > driver binaries, written in whatever language Agreed. I like all of these alternatives. They suit different scenarios. > ...(4) is a useful option, but I think one that needs to be defined > separately from the Scheme-level API first, so that (1)-(3) can also be > done. As a portable library, there will be value in having a > separate-process driver for, eg, PostgreSQL that can be used in any > scheme where subprocesses are a thing and the driver can be compiled > (eg, on Linux), while schemes with a suitable FFI can choose to use a > less-portable but more efficient binding to libpq, etc. Let a community > of different drivers bloom! This is exactly what I intended with the proposal :) >> While I'm at it, how about this extremely simple protocol: > [snip] >> Comments? > > I'm not sure how the wire protocols work for MySQL, PostgreSQL, etc. but > I seem to recall that the library interfaces at least in principle allow > you to do something like: > > rs1 = query("SELECT * FROM very_big_table"); > rs2 = query("SELECT ... FROM t1, t2, t3, t4, t5 WHERE ...join > conditions..."); > > repeat { > r1 = rs1.read_row(); > r2 = rs2.read_row(); > ...some process involving r1 and r2... > } until rs1.eof() || rs2.eof(); > > ...without consuming unbounded local storage or deadlocking. Eg, have > two queries streaming results back at the same time over the same > "connection". That's very cool, but probably also highly non-trivial to implement reliably. Given that the subprocess thing is probably more suited to situations that are not performance-sensitive, it might be best to go for simplicity and permit only one query at a time. As for the in-process implementations, if they are FFI-based I imagine the underlying C library has to do some non-trivial poll and/or thread stuff to juggle the concurrent queries. In this case, it might be most reliable to use a fast Scheme that handles polling and threads in its own runtime, and write a pure-Scheme driver for the Postgres/MySQL socket protocol. That driver could then juggle the concurrent stuff, relying on Scheme's safety guarantees. > A few years back, I worked for a SQL database company; we had a wire > protocol that our ODBC driver used to talk to the database servers, and > the API structure was roughly like this: > > - StartQuery(query text) -> query ID > - FetchResults(query ID) -> maybe(list(record)) > - GetMetadata(query ID) -> list(column name/type declaration) > - CancelQuery(query ID) > > ...plus a bunch of boring server/connection-level metadata get/set > operations, and some other stuff for setting up prepared statements that > I never looked at. Nice! So you've basically already done this before :) That command language seems quite reasonable. > But the key part was, you could issue a whole bunch of StartQuery > operations then call FetchResults on the resulting query IDs with an > arbitrary interleaving; each would return an arbitrarily-sized chunk of > records, or Nothing if the query was finished and no more records would > ever be returned. The ODBC client buffered those chunks as the > application requested a record at a time, to amortize the network round > trip overheads. In principle, this would likely be relatively easy to do with a subprocess (compared to threads and FFI). But since parallel queries are for high-performance situations, people might want to go with the pure-Scheme clients anyway. This is just speculation; since you've actually worked with this stuff in a production setting, feel free to offer counterpoints. > (What that database server did at the backend is another, fascinating, > story I summarised in a blog post: > http://www.snell-pym.org.uk/archives/2016/12/11/cool-things-i-have-worked-on-clustered-analytic-database/ > ) That is some seriously impressive database wizardry. > But: in general, I think a general database access protocol needs to > support this, as some backends will be capable of doing that sort of > query interleaving. This should be addressed in the Scheme-side interface that is visible to application programmers. Figuring out how to do it in a manner that is portable across implementations will be interesting.. It may be best to have two SRFIs: a simple serial one and a complex one that supports interleaving.