Re: Encoding projects to kick off this year

Show/hide message thread

Encoding projects to kick off this year Lassi Kortela (08 Jul 2020 14:13 UTC)

Re: Encoding projects to kick off this year Lassi Kortela (08 Jul 2020 14:24 UTC)

Re: Encoding projects to kick off this year John Cowan (08 Jul 2020 15:00 UTC)

Re: Encoding projects to kick off this year Lassi Kortela (08 Jul 2020 15:11 UTC)

Re: Encoding projects to kick off this year Arthur A. Gleckler (08 Jul 2020 15:11 UTC)

Re: Encoding projects to kick off this year Lassi Kortela (08 Jul 2020 15:17 UTC)

Re: Encoding projects to kick off this year Arthur A. Gleckler (08 Jul 2020 18:23 UTC)

Re: Encoding projects to kick off this year Arthur A. Gleckler (08 Jul 2020 18:30 UTC)

Re: Encoding projects to kick off this year Alaric Snell-Pym (10 Jul 2020 16:43 UTC)

Re: Encoding projects to kick off this year Alaric Snell-Pym (10 Jul 2020 16:37 UTC)

Re: Encoding projects to kick off this year Alaric Snell-Pym 10 Jul 2020 16:36 UTC

Show/hide attachments

On 08/07/2020 15:13, Lassi Kortela wrote:
> Things are converging such that I need to start putting more time into
> encodings again.
>
>
> == Subprocess protocol
>
[...]
> Basically, a program `parent` would run another program `child` as a
> subprocess, with a binary pipe to and from the child's stdin/stdout. The
> pipe would speak a standard, very lightweight messaging/PRC protocol
> (not yet decided which one).

Yeah! I did exactly this for my content-addressible storage system,
Ugarit - the reasoning there being (a) you get a "plugin" system for
third-party storage backends without needing to mess with shared
libraries portable and (b) you can make pipelines with ssh to access
backends on remote systems.

To clarify the latter: the configuration lets you specify a command line
for the storage backend process, which might be:

"ugarit-foo-backend /path/to/directory"

or it might be:

"ssh xxxxxx@server 'ugarit-foo-backend /path/to/directory/on/server'"

I used a basic protocol I hacked together for the task at hand. Requests
and responses are (depending on type) either just an s-expression or an
s-expression including a length followed by that many bytes of raw data
(as Ugarit's all about throwing around large blocks of data), which is
clearly very tailored to the task I had at hand.

However, I want to expand it to support TCP sockets and UNIX-domain
sockets as well as just subprocesses (mainly because using ssh as a way
to access remote servers introduces some messy latency at times), so I
intend to extend a separate project of mine, "bokbok", which provides
RPC services over TCP or UNIX-domain sockets, to also support
subprocesses in the same framework, and port Ugarit to use Bokbok.

This can also allow me to make Ugarit more efficient, as Bokbok's
protocol allows for multiple request in progress; the old Ugarit
encoding was strictly "send a request, wait for the response", and
Ugarit could benefit from increased parallelism in its operations. That
high latency of ssh means when backing up to a remote backend, a lot of
time is spent with the Ugarit frontend and backend processes doing
nothing other than waiting for ssh/the network to handle a small request.

So, yeah, this is cool, but I recommend NOT making it just specific to
subprocesses, and to share the work of defining request/response/error
formats with a more general RPC system!

Relatedly, while working on the binary encoding of Scheme values used by
bokbok on the wire (I didn't want to just use read/write as the Ugarit
protocol does as I want to use bokbok in environments with less trust,
as read/write are too complicated to secure, especially with
implementations including reader extensions that can execute arbitrary
code), I implemented a subset of John's ASN.1-based binary sexpr thing
that I'd like to spin off as a separate project that becomes a full
implementation one day!

> Binary S-expressions need to be done. John is rooting for ASN.1; still
> an open question whether it is the best foundation to build on.

I think that actual compatibility with ASN.1 as specified isn't a very
interesting goal (who, exactly, wants to interoperate between Scheme and
existing ASN.1 things using this?), but using ASN.1 BER as an
inspiration to draw upon (and to copy the useful parts of when there's
no downside to doing so) is... pragmatic. Some thought was put into the
basic tagged value structure of BER, so why not use it? I have a few
issues with specifics of John's proposal, though, which I can expound
upon when the time comes!

ABS

--
Alaric Snell-Pym   (M7KIT)
http://www.snell-pym.org.uk/alaric/