Re: Attempt at a stack of data formats to make everyone happy

Show/hide message thread

Attempt at a stack of data formats to make everyone happy Lassi Kortela (19 Sep 2019 17:28 UTC)

Sketching the format stack Lassi Kortela (19 Sep 2019 18:07 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela (19 Sep 2019 19:43 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela (19 Sep 2019 19:44 UTC)

Re: Attempt at a stack of data formats to make everyone happy John Cowan (19 Sep 2019 20:19 UTC)

Re: Attempt at a stack of data formats to make everyone happy John Cowan (20 Sep 2019 20:59 UTC)

Re: Attempt at a stack of data formats to make everyone happy Arthur A. Gleckler (20 Sep 2019 22:19 UTC)

Re: Attempt at a stack of data formats to make everyone happy Alaric Snell-Pym (24 Sep 2019 09:02 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela (24 Sep 2019 09:29 UTC)

Core S-expression and binary formats John Cowan (24 Sep 2019 14:49 UTC)

Re: Core S-expression and binary formats John Cowan (25 Sep 2019 02:14 UTC)

Sharpsign syntax for hashtables, sets, bytevectors, etc. Lassi Kortela (25 Sep 2019 08:26 UTC)

Bytevector literals Lassi Kortela (25 Sep 2019 08:38 UTC)

Re: Sharpsign syntax for hashtables, sets, bytevectors, etc. Alaric Snell-Pym (25 Sep 2019 09:33 UTC)

Re: Sharpsign syntax for hashtables, sets, bytevectors, etc. Lassi Kortela (25 Sep 2019 09:53 UTC)

Re: Sharpsign syntax for hashtables, sets, bytevectors, etc. Alaric Snell-Pym (25 Sep 2019 10:32 UTC)

String literals inside bytevector literals Lassi Kortela (25 Sep 2019 10:46 UTC)

A S-expression syntax that can carry all this stuff Lassi Kortela (19 Sep 2019 20:01 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela 19 Sep 2019 19:44 UTC

 > Mini-rant about the horrible awfulness of text-with-counted-strings,

True for sure. I don't suppose anyone intends such formats to be edited
nowadays. I imagine the dusty decks are not a mere figure of speech :)

 > it's much better *not* to have different rigid levels, but
 > rather to have a single representation that scales smoothly and can, as
 > uniformly as possible, skip what it does not understand.

I was being imprecise; if the simple formats can be a straight subset of
the complex uber-format, so much the better. They shouldn't be
gratuitously different.

My point (which I didn't state clearly) is that a simple spec that
covers everything the implementor needs to know (think: JSON) is hugely
valuable. The reason smart programmers give up on ASN.1 (apart from its
security reputation) is that they try to read some spec or intro to get
the gist of it, and get sucked into this hole where there's no end in
sight to the complexity. People want to gauge how much work is ahead of
them, so giant specs are disorienting and demoralizing for most jobs
where you use maybe 10% of that stuff.

People often pick a format on the basis of "I have a simple job; can I
get it done by this evening?" Things like ASN.1 and SGML are the
anti-format from this standpoint. XML tried to fix that for SGML, but
the result still wasn't easy enough for simple jobs; hence MicroXML is a
good idea, and JSON which is even simpler took over the world.

JSON is so appealing because the spec says "can you believe this is all
you get?" Classic S-expressions have the same appeal. That's a relief to
programmers already buried under complexity all day.

So if the simple format is a subset of ASN.1 we should still write a
separate spec for it, with a maximum of 5 easy-to-read pages and a
promise that "this is all there is". It's nice for the upgrade path to a
bigger standard to be as smooth as possible but in my opinion simplicity
is even more important.

 > ASN.1 is ideal here for binary (but I would say that, wouldn't I).

I have no problem with you advocating it. You have a convincing
rationale for your preference which you have explained clearly.

 > For text, we'll need
 > to have a notion of core S-expressions, perhaps R5RS or R7RS <datum> and
 > nothing more, that everyone has to understand.  (Interchanging ratios and
 > complex numbers is tricky, though, even with something as close as Common
 > Lisp, which does #c(1.0 2.0) instead of 1.0+2.0i.

I'd also like everything to be S-expression-based, though I'll listen to
counterarguments.

There's no hope of specifying a widely compatible subset of
S-expressions for intricate stuff, but we can ship read/write libraries
for every popular Lisp dialect. That's almost as good.

 >> non-hierarchical application formats inevitably expand into hierarchical
 >> ones
 >
 > Start with hierarchy, then!

I'm happy with hierarchy but other programmers keep wanting anarchy ;-)

 > Do it once and do it right, leveraging standards as hard as we can: the
 > people who designed them had different world-views from some of us, but
 > they weren't dumb.

I use to think for years that using a standard solution where available
is the most important thing. A couple years ago I had some
transformative programming experiences (can't remember which, sadly)
that showed that simplicity is more important than standardization.
Following complex standards, apart from the immediate practical problems
that stem from complexity, also perpetuate the spread of complexity.

Now I no longer hesitate to make a simple re-invention of the wheel when
all the standard wheels are complex. There's usually an immediate
feeling of relief in shedding the complexity.

Of course, we shouldn't try to shed intrinsic complexity which, as Matz
says, always finds a way to pop up somewhere, and the longer you delay
tackling it, it just grows more extrinsic complexity on top.

But many applications are intrinsically simple, and are well served by
simple subsets or adaptations of complex things.