Encoding projects to kick off this year

Show/hide message thread

Encoding projects to kick off this year Lassi Kortela (08 Jul 2020 14:13 UTC)

Re: Encoding projects to kick off this year Lassi Kortela (08 Jul 2020 14:24 UTC)

Re: Encoding projects to kick off this year John Cowan (08 Jul 2020 15:00 UTC)

Re: Encoding projects to kick off this year Lassi Kortela (08 Jul 2020 15:11 UTC)

Re: Encoding projects to kick off this year Arthur A. Gleckler (08 Jul 2020 15:11 UTC)

Re: Encoding projects to kick off this year Lassi Kortela (08 Jul 2020 15:17 UTC)

Re: Encoding projects to kick off this year Arthur A. Gleckler (08 Jul 2020 18:23 UTC)

Re: Encoding projects to kick off this year Arthur A. Gleckler (08 Jul 2020 18:30 UTC)

Re: Encoding projects to kick off this year Alaric Snell-Pym (10 Jul 2020 16:43 UTC)

Re: Encoding projects to kick off this year Alaric Snell-Pym (10 Jul 2020 16:37 UTC)

Encoding projects to kick off this year Lassi Kortela 08 Jul 2020 14:13 UTC

Things are converging such that I need to start putting more time into
encodings again.

== Subprocess protocol

Based on that database subprocess thing we did, I have a general idea to
establish a protocol for subprocess servers of all kinds. It would be
programming language agnostic: there's no need to tie it to Lisp/Scheme.

Basically, a program `parent` would run another program `child` as a
subprocess, with a binary pipe to and from the child's stdin/stdout. The
pipe would speak a standard, very lightweight messaging/PRC protocol
(not yet decided which one). The protocol would have standard data types
(approximately the same set that JSON has); standard ways to mark
messages as command, answer, and error; and a standard facility for
reflection (i.e. finding out which messages are supported). Of course,
since it's just a pipe, one can transparently switch to a socket
(perhaps with TLS) instead.

Subprocesses like this could be made for file systems, file formats,
databases, query engines, data sources, and anything else one can think
of. Most acutely we'd need the databases for Schemepersist, but this
feels like the kind of thing that would turn into a thriving cottage
industry as long as the protocol is simple and we seed it with a set of
useful programs.

There are a zillion encodings that could be used for the protocol:
JSON-RPC, MessagePack, S-expressions, ASN.1, etc. etc. Suggestions
gratefully accepted. Two things I don't like about most of the alternatives:

* Too many data types that are only marginally useful.
* Arbitrary limits or complexities in integer encoding.

One main question about the protocol is how to represent messages:
Lisp-style lists (message-type . args) or JSON-style objects
{"message-type": "foo", "args" ...}.

== Cataloguing S-expression variants

I want to start an "encyclopedia" of every variant of S-expressions
ever. It can be seeded with the syntaxes of the major Lisp dialects.

I'd also like to accumulate a library capable of reading and writing all
of them (not that hard, since there are so many commonalities that it
can be made out of reusable building blocks).

Once we have that library, it can be plugged into a universal Lisp
pretty-printer. Then we can make a code formatter that can format every
Lisp dialect, with customized indentation for macros. The hard parts are
parsing (preserving comments), figuring out where to put line breaks,
and customized indentation for macros. It makes sense to do the job
right once. We can start with Marc Feeley's popular pretty-printer as
well as these papers:

* Strictly Pretty (Lindig 2000) -- Implemented by Arthur, this paper
translates some Haskell work to OCaml which should be translatable to
Scheme. The Haskell folks discovered a small set of generic combinators
that can act as the backbone of pretty-printers for arbitrary languages.

* AI Memo 279: Pretty-Printing (Goldstein 1973) -- Describes the classic
GRIND algorithm from the MIT Lisp community.

== Defining portable variants of S-expressions

We should formally define some known-portable variants that people can
use when they want assurance of cross-Lisp interoperability. LOSE
(line-oriented s-expressions) is a start, as is Core S-expressions that
John drafted based on discussions on the schemepersist list last year.

Binary S-expressions need to be done. John is rooting for ASN.1; still
an open question whether it is the best foundation to build on.

== Schemas and translators

There needs to be a generic language to easily write schemas for
S-expression-based formats. Discovering generic rules to translate
between S-expression-like and JSON-like formats would be very useful, as
S-expressions are quite fringe.