Re: Buffaloed and dogpiled

Show/hide message thread
SQLite subprocess working Lassi Kortela (17 Sep 2019 17:50 UTC)
Buffaloed and dogpiled (was: SQLite subprocess working) John Cowan (17 Sep 2019 19:07 UTC)
Re: Buffaloed and dogpiled Lassi Kortela (17 Sep 2019 21:01 UTC)
(missing)
Re: Buffaloed and dogpiled Lassi Kortela (19 Sep 2019 09:09 UTC)
Re: Buffaloed and dogpiled John Cowan (20 Sep 2019 15:25 UTC)
Re: Buffaloed and dogpiled Lassi Kortela 17 Sep 2019 21:01 UTC
Had to look up buffaloed and dogpiled in Urban Dictionary. I'm sorry you
feel that way :(

> I admire the ability to get a POC running fast (I wish I had it).   But I
> feel a bit buffaloed here, and I think the purpose of the database
> subprocess idea has been lost sight of.  Can we take a step back here?

Thank you. TBH it's mostly time and high pain tolerance...

We can take a step back, but to where?

> This is what I thought the subprocess design was all about:
>
> 0) Simplicity trumps efficiency

Agreed.

I think the root of our disagreement, to the extent there is such, is
whether or not text formats are simple. Personally I'm of the opinion
that there is no such thing as a simple text format, and I've designed,
specified, generated and parsed many more of them than I can remember.

When people say that text formats are simpler because you can parse and
generate them using existing tools, I think of all the little problems
that those tools don't address, and which binary formats don't have.

In the present case for example, any performance/reliability problems
with a binary format are going to come from blob columns in databases.
Transferring a blob is:

     (write-varint (bytevector-length blob))
     (write-bytevector blob)

     (read-bytevector (read-varint))

Strings and symbols work the exact same way as blobs. The only things at
all complex in this whole arrangement are read-varint and write-varint.
But varints are so cheap and so universally useful that IMHO it'd be
fruitful to add those two trivial primitives to any implementation
lacking them. It's like 30 lines of C. And even if you don't have them
in C, they are easy enough to implement in Scheme.

Text formats (in any language implementation that doesn't have the
S-expression syntax we want as its native one) are going to be in the
same ballpark performance-wise. But huge blobs might be slower, and
reliability is usually hampered by escaping, exotic characters and the
rest :/

> 1) Minimum requirements on the Scheme client, preferably no more than R5RS
> or R7RS-small.

The current one should work with R7RS-small.

> 2) Abililty to implement the subprocess in any language that can talk to
> the database.

Basically yes, but in practice, C/Java/Python probably covers all databases.

> 3) The subprocess is in the same security context with the client, and they
> can trust each other, at least as much as anything else one loads from a
> package manager or srfi.schemers.org.  Subprocesses are not normally
> exposed to malicious users, as the database server may be.

Agreed, but I'd still but good error handling in the subprocess. It's
not a ton of extra effort.

> To me, these mean:
>
> Bit-diddling is possible in any Scheme with arithmetic operations (see the
> completely portable implementation of SRFI 151), but it involves a bunch of
> divides and modulos and looking things up in vector tables.  (All kudos to
> Aubrey Jaffer for providing bitwise-and, bitwise-or, arithmetic-shift, and
> the rest of the bitwise core:  see <
> https://github.com/scheme-requests-for-implementation/srfi-151/blob/master/srfi-151/bitwise-core.scm>.)
> So from an efficiency standpoint, especially in an interpreted Scheme,
> using the built-in I/O like read and write makes a lot of sense.  (See Note
> 1 and Note 2.)  So: standard S-expressions both ways.

I would have to file this performance concern under "profile before
optimizing", and using text for this reason as optimizing the wrong thing.

The current binary-sexp implementation depends only on these I/O primitives:

* read one byte or EOF
* read bytevector of exactly N bytes
* write one byte
* write bytevector of exactly N bytes

Bitwise operations are only used by `read-varint` and `write-varint`.
This may change with floats; we have to think hard about those. My point
is that this is a very thin layer over just constructing sexps by hand
by calling `cons` and such.

Having fast text and grade-school math, but slow binary and bitwise ops,
is an abstraction inversion. We have to talk to those implementors :)

More generally: If you worry that things are slow on a particular
implementation, let me check out that implementation and see what can be
done. I'd bet it's not a significant problem in a realistic scenario.

> The Simplest Thing That Could Possibly Work is to have an outside manager
> (a bash script will do) create two named pipes and pass their names to both
> the client program and the subprocess (no longer "sub-") on the command
> line or in environment variables.  (Note that there are named pipes on
> WIndows.)  They open their files, the client writes to one while the
> subprocess reads it until it's done, then they reverse roles for the
> reply.  *Any* R5RS Scheme can do this.

This should work already. Haven't tried named pipes, but the subprocess
just reads from stdin and writes to stdout.

> The client can even close and
> reopen the "request" named pipe to force flushing if it doesn't have R7RS
> flush-output.

Portability is important but in my opinion it has to be a two-way
street: the target platform needs to provide basic amenities from CS /
operating systems 101. Why worry about producing/consuming database
drivers in an environment that doesn't have ordinary binary I/O and
bitwise arithmetic?

> Assuming Python or Perl is safe on any Posix system, and it's actually
> easier to get going on Windows than a C development environment, especially
> since Visual C++ basically only has C99 (and until 2017 only C89).  So if a
> subprocess is supplied in any of these languages, it's enough.  Perl has a
> DBI module to provide the user interface and DBD modules to plug in (see
> Note Python just makes its DB-API a specification that all drivers are by
> convention expected to support.

Agreed. I think people should be able to implement subprocesses in any
language they please (as is customary on Unix) and whoever puts in the
work to implement a driver is free to pick the language to use. If
someone else wants to re-implement a Perl/Python driver in C, they are
also free to do so, as long as everyone conforms to the IPC protocol.

Python and Perl are not as taken-for-granted on BSDs and Windows as they
are on Linux/Mac, and they add a lot of weight to Docker containers.

> Note 1: I accept as friendly amendments the use of (name . "value") instead
> of name=value, and of ordinary double-quoted strings instead of
> triple-quoted ones, in my proposed textual protocol.

Thanks :)

(name "value") might be even more portable, I just put in the consing
dot to get a minimal example.

> Note 2:  Writing a portable implementation of read that handles *exactly*
> what is described by the BNF for <datum> in R7RS-small, and lets you set
> limits on list/vector/bytevector length, on nesting depth, and on numeric
> tower support, would be a Good Thing.  It could be packaged with a similar
> portable implementation of write to make a good SRFI with safe-read and
> safe-write, if anyone feels ambitious.  I have implemented a zillion such
> parsers in the last 45 years, and I'm thoroughly sick of writing them.

Libraries to read/write safe, well-defined S-expressions, whether as
text or binary, are a good thing. I'd like to have a generic S-expr
library that can be configured for the different flavors, since there is
so much commonality. I can explore something in this direction, but
can't promise a timeline. Since Alaric is converting S-exprs in his
Magic Pipes package, maybe we can work something out.

> Note 4:  I'd love to have a pluggable architecture, what Java calls an SPI
> (Service Provider Interface)
> But I have thought about it a lot, and I don't see how you can make one of
> those work in Scheme, because there is no way to ask a procedure what it
> expects at even the simplest level of "how many arguments".  Procedures are
> opaque: all you can do is call them, with completely unpredictable
> results.  If anyone has ideas, please pass them along.

I tried to read that page but didn't really understand what they are
talking about. There's a subclass of a class/interface, and an XML file.
What's the difference between this and ordinary class inheritance?