Re: Attempt at a stack of data formats to make everyone happy

Show/hide message thread

Attempt at a stack of data formats to make everyone happy Lassi Kortela (19 Sep 2019 17:28 UTC)

Sketching the format stack Lassi Kortela (19 Sep 2019 18:07 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela (19 Sep 2019 19:43 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela (19 Sep 2019 19:44 UTC)

Re: Attempt at a stack of data formats to make everyone happy John Cowan (19 Sep 2019 20:19 UTC)

Re: Attempt at a stack of data formats to make everyone happy John Cowan (20 Sep 2019 20:59 UTC)

Re: Attempt at a stack of data formats to make everyone happy Arthur A. Gleckler (20 Sep 2019 22:19 UTC)

Re: Attempt at a stack of data formats to make everyone happy Alaric Snell-Pym (24 Sep 2019 09:02 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela (24 Sep 2019 09:29 UTC)

Core S-expression and binary formats John Cowan (24 Sep 2019 14:49 UTC)

Re: Core S-expression and binary formats John Cowan (25 Sep 2019 02:14 UTC)

Sharpsign syntax for hashtables, sets, bytevectors, etc. Lassi Kortela (25 Sep 2019 08:26 UTC)

Bytevector literals Lassi Kortela (25 Sep 2019 08:38 UTC)

Re: Sharpsign syntax for hashtables, sets, bytevectors, etc. Alaric Snell-Pym (25 Sep 2019 09:33 UTC)

Re: Sharpsign syntax for hashtables, sets, bytevectors, etc. Lassi Kortela (25 Sep 2019 09:53 UTC)

Re: Sharpsign syntax for hashtables, sets, bytevectors, etc. Alaric Snell-Pym (25 Sep 2019 10:32 UTC)

String literals inside bytevector literals Lassi Kortela (25 Sep 2019 10:46 UTC)

A S-expression syntax that can carry all this stuff Lassi Kortela (19 Sep 2019 20:01 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela 19 Sep 2019 19:43 UTC

 From John:

On Thu, Sep 19, 2019 at 1:28 PM Lassi Kortela <xxxxxx@lassi.io> wrote:

OK, let's concede that this Scheme database mailing list has turned into
 > the de facto Scheme data encoding mailing list as well :) We can
 > rationalize this by saying that both topics deal with persistence.
 >

+1

 > - Text is advantageous in some situations, binary in others.
 >

+1.  Mini-rant about the horrible awfulness of text-with-counted-strings,
something discovered when the first Fortran program was written:  At that
time the only strings were format strings, and instead of escaping the
format directives with % or ~ or what not, the literal parts were marked —
by a length (1-3 digits typically) followed by H followed by the literal
characters, no trailing indication at all.   That *never* worked reliably,
and was first deprecated and then removed in later Fortran standards, but
implementations keep it alive to keep dusty decks running.  Be warned.
Don't go there.

- The data models are most naturally approaches as a stack of growing
 > complexities. For simple jobs, desiring simple fast implementations and
 > the widest interoperability, it's nice to have a data model like JSON's
 > (or even simpler) with a small handful of universally useful data types.
 >

I don't actually agree with this.  The importance of JSON is that it is
standardized, both de facto and de jure, so we need an implementation of
it.  Otherwise, it's much better *not* to have different rigid levels, but
rather to have a single representation that scales smoothly and can, as
uniformly as possible, skip what it does not understand.  ASN.1 is ideal
here for binary (but I would say that, wouldn't I).  For text, we'll need
to have a notion of core S-expressions, perhaps R5RS or R7RS <datum> and
nothing more, that everyone has to understand.  (Interchanging ratios and
complex numbers is tricky, though, even with something as close as Common
Lisp, which does #c(1.0 2.0) instead of 1.0+2.0i.

A very very rough sketch of my idea for extending S-expressions smoothly:

On input, a list whose car is a symbol beginning with "." is mapped through
a procedure associated with that symbol by the caller, and whatever the
procedure returns replaces that part of the input.  This is effectively
macroexpansion at the lexical level.  On output, you need a mapping from
type predicates to such lists.  The virtue of this design is that it works
*on top of* `read` and `write`, and therefore is portable, except to people
who use symbols with dots in front (hopefully few to none, but substitute
another symbol character if you want, just not colon).  Yes, it's a kludge.

The alternative is to use SRFI-10 syntax:  #.(list).  This is safe,
requires modifications to read and write, and is inherently unportable.
(The associated predicate set-reader-ctor! is inappropriately global and
shouldn't be used.)

 > - Application-specific formats should be built on top of the above
 > generic formats. There's a kind of Zawinski's law at work here:
 > non-hierarchical application formats inevitably expand into hierarchical
 > ones; those which cannot so expand are replaced by ones which can (or,
 > more frequently, hierarchical extensions are violently shoehorned onto
 > the staunchly non-hierarchical ones to create weird franken-formats).
 >

Start with hierarchy, then!

 > - Survey all the existing data representations around Lisp (at which
 > task John has already made a fine start with his spreadsheet).
 >

Thank you, thank you, I'm here all year!

Opinions?
 >

Do it once and do it right, leveraging standards as hard as we can: the
people who designed them had different world-views from some of us, but
they weren't dumb.

John Cowan          http://vrici.lojban.org/~cowan        xxxxxx@ccil.org
Awk!" sed Grep. "A fscking python is perloining my Ruby; let me bash
     him with a Cshell!  Vi didn't I mount it on a troff?" --Francis Turner