SQLite subprocess working
Lassi Kortela
(17 Sep 2019 17:50 UTC)
|
||
Buffaloed and dogpiled (was: SQLite subprocess working)
John Cowan
(17 Sep 2019 19:07 UTC)
|
||
Re: Buffaloed and dogpiled Lassi Kortela (17 Sep 2019 21:01 UTC)
|
||
(missing)
|
||
Re: Buffaloed and dogpiled
Lassi Kortela
(19 Sep 2019 09:09 UTC)
|
||
Re: Buffaloed and dogpiled
John Cowan
(20 Sep 2019 15:25 UTC)
|
Had to look up buffaloed and dogpiled in Urban Dictionary. I'm sorry you feel that way :( > I admire the ability to get a POC running fast (I wish I had it). But I > feel a bit buffaloed here, and I think the purpose of the database > subprocess idea has been lost sight of. Can we take a step back here? Thank you. TBH it's mostly time and high pain tolerance... We can take a step back, but to where? > This is what I thought the subprocess design was all about: > > 0) Simplicity trumps efficiency Agreed. I think the root of our disagreement, to the extent there is such, is whether or not text formats are simple. Personally I'm of the opinion that there is no such thing as a simple text format, and I've designed, specified, generated and parsed many more of them than I can remember. When people say that text formats are simpler because you can parse and generate them using existing tools, I think of all the little problems that those tools don't address, and which binary formats don't have. In the present case for example, any performance/reliability problems with a binary format are going to come from blob columns in databases. Transferring a blob is: (write-varint (bytevector-length blob)) (write-bytevector blob) (read-bytevector (read-varint)) Strings and symbols work the exact same way as blobs. The only things at all complex in this whole arrangement are read-varint and write-varint. But varints are so cheap and so universally useful that IMHO it'd be fruitful to add those two trivial primitives to any implementation lacking them. It's like 30 lines of C. And even if you don't have them in C, they are easy enough to implement in Scheme. Text formats (in any language implementation that doesn't have the S-expression syntax we want as its native one) are going to be in the same ballpark performance-wise. But huge blobs might be slower, and reliability is usually hampered by escaping, exotic characters and the rest :/ > 1) Minimum requirements on the Scheme client, preferably no more than R5RS > or R7RS-small. The current one should work with R7RS-small. > 2) Abililty to implement the subprocess in any language that can talk to > the database. Basically yes, but in practice, C/Java/Python probably covers all databases. > 3) The subprocess is in the same security context with the client, and they > can trust each other, at least as much as anything else one loads from a > package manager or srfi.schemers.org. Subprocesses are not normally > exposed to malicious users, as the database server may be. Agreed, but I'd still but good error handling in the subprocess. It's not a ton of extra effort. > To me, these mean: > > Bit-diddling is possible in any Scheme with arithmetic operations (see the > completely portable implementation of SRFI 151), but it involves a bunch of > divides and modulos and looking things up in vector tables. (All kudos to > Aubrey Jaffer for providing bitwise-and, bitwise-or, arithmetic-shift, and > the rest of the bitwise core: see < > https://github.com/scheme-requests-for-implementation/srfi-151/blob/master/srfi-151/bitwise-core.scm>.) > So from an efficiency standpoint, especially in an interpreted Scheme, > using the built-in I/O like read and write makes a lot of sense. (See Note > 1 and Note 2.) So: standard S-expressions both ways. I would have to file this performance concern under "profile before optimizing", and using text for this reason as optimizing the wrong thing. The current binary-sexp implementation depends only on these I/O primitives: * read one byte or EOF * read bytevector of exactly N bytes * write one byte * write bytevector of exactly N bytes Bitwise operations are only used by `read-varint` and `write-varint`. This may change with floats; we have to think hard about those. My point is that this is a very thin layer over just constructing sexps by hand by calling `cons` and such. Having fast text and grade-school math, but slow binary and bitwise ops, is an abstraction inversion. We have to talk to those implementors :) More generally: If you worry that things are slow on a particular implementation, let me check out that implementation and see what can be done. I'd bet it's not a significant problem in a realistic scenario. > The Simplest Thing That Could Possibly Work is to have an outside manager > (a bash script will do) create two named pipes and pass their names to both > the client program and the subprocess (no longer "sub-") on the command > line or in environment variables. (Note that there are named pipes on > WIndows.) They open their files, the client writes to one while the > subprocess reads it until it's done, then they reverse roles for the > reply. *Any* R5RS Scheme can do this. This should work already. Haven't tried named pipes, but the subprocess just reads from stdin and writes to stdout. > The client can even close and > reopen the "request" named pipe to force flushing if it doesn't have R7RS > flush-output. Portability is important but in my opinion it has to be a two-way street: the target platform needs to provide basic amenities from CS / operating systems 101. Why worry about producing/consuming database drivers in an environment that doesn't have ordinary binary I/O and bitwise arithmetic? > Assuming Python or Perl is safe on any Posix system, and it's actually > easier to get going on Windows than a C development environment, especially > since Visual C++ basically only has C99 (and until 2017 only C89). So if a > subprocess is supplied in any of these languages, it's enough. Perl has a > DBI module to provide the user interface and DBD modules to plug in (see > Note Python just makes its DB-API a specification that all drivers are by > convention expected to support. Agreed. I think people should be able to implement subprocesses in any language they please (as is customary on Unix) and whoever puts in the work to implement a driver is free to pick the language to use. If someone else wants to re-implement a Perl/Python driver in C, they are also free to do so, as long as everyone conforms to the IPC protocol. Python and Perl are not as taken-for-granted on BSDs and Windows as they are on Linux/Mac, and they add a lot of weight to Docker containers. > Note 1: I accept as friendly amendments the use of (name . "value") instead > of name=value, and of ordinary double-quoted strings instead of > triple-quoted ones, in my proposed textual protocol. Thanks :) (name "value") might be even more portable, I just put in the consing dot to get a minimal example. > Note 2: Writing a portable implementation of read that handles *exactly* > what is described by the BNF for <datum> in R7RS-small, and lets you set > limits on list/vector/bytevector length, on nesting depth, and on numeric > tower support, would be a Good Thing. It could be packaged with a similar > portable implementation of write to make a good SRFI with safe-read and > safe-write, if anyone feels ambitious. I have implemented a zillion such > parsers in the last 45 years, and I'm thoroughly sick of writing them. Libraries to read/write safe, well-defined S-expressions, whether as text or binary, are a good thing. I'd like to have a generic S-expr library that can be configured for the different flavors, since there is so much commonality. I can explore something in this direction, but can't promise a timeline. Since Alaric is converting S-exprs in his Magic Pipes package, maybe we can work something out. > Note 4: I'd love to have a pluggable architecture, what Java calls an SPI > (Service Provider Interface) > But I have thought about it a lot, and I don't see how you can make one of > those work in Scheme, because there is no way to ask a procedure what it > expects at even the simplest level of "how many arguments". Procedures are > opaque: all you can do is call them, with completely unpredictable > results. If anyone has ideas, please pass them along. I tried to read that page but didn't really understand what they are talking about. There's a subclass of a class/interface, and an XML file. What's the difference between this and ordinary class inheritance?