From John: On Thu, Sep 19, 2019 at 1:28 PM Lassi Kortela <xxxxxx@lassi.io> wrote: OK, let's concede that this Scheme database mailing list has turned into > the de facto Scheme data encoding mailing list as well :) We can > rationalize this by saying that both topics deal with persistence. > +1 > - Text is advantageous in some situations, binary in others. > +1. Mini-rant about the horrible awfulness of text-with-counted-strings, something discovered when the first Fortran program was written: At that time the only strings were format strings, and instead of escaping the format directives with % or ~ or what not, the literal parts were marked — by a length (1-3 digits typically) followed by H followed by the literal characters, no trailing indication at all. That *never* worked reliably, and was first deprecated and then removed in later Fortran standards, but implementations keep it alive to keep dusty decks running. Be warned. Don't go there. - The data models are most naturally approaches as a stack of growing > complexities. For simple jobs, desiring simple fast implementations and > the widest interoperability, it's nice to have a data model like JSON's > (or even simpler) with a small handful of universally useful data types. > I don't actually agree with this. The importance of JSON is that it is standardized, both de facto and de jure, so we need an implementation of it. Otherwise, it's much better *not* to have different rigid levels, but rather to have a single representation that scales smoothly and can, as uniformly as possible, skip what it does not understand. ASN.1 is ideal here for binary (but I would say that, wouldn't I). For text, we'll need to have a notion of core S-expressions, perhaps R5RS or R7RS <datum> and nothing more, that everyone has to understand. (Interchanging ratios and complex numbers is tricky, though, even with something as close as Common Lisp, which does #c(1.0 2.0) instead of 1.0+2.0i. A very very rough sketch of my idea for extending S-expressions smoothly: On input, a list whose car is a symbol beginning with "." is mapped through a procedure associated with that symbol by the caller, and whatever the procedure returns replaces that part of the input. This is effectively macroexpansion at the lexical level. On output, you need a mapping from type predicates to such lists. The virtue of this design is that it works *on top of* `read` and `write`, and therefore is portable, except to people who use symbols with dots in front (hopefully few to none, but substitute another symbol character if you want, just not colon). Yes, it's a kludge. The alternative is to use SRFI-10 syntax: #.(list). This is safe, requires modifications to read and write, and is inherently unportable. (The associated predicate set-reader-ctor! is inappropriately global and shouldn't be used.) > - Application-specific formats should be built on top of the above > generic formats. There's a kind of Zawinski's law at work here: > non-hierarchical application formats inevitably expand into hierarchical > ones; those which cannot so expand are replaced by ones which can (or, > more frequently, hierarchical extensions are violently shoehorned onto > the staunchly non-hierarchical ones to create weird franken-formats). > Start with hierarchy, then! > - Survey all the existing data representations around Lisp (at which > task John has already made a fine start with his spreadsheet). > Thank you, thank you, I'm here all year! Opinions? > Do it once and do it right, leveraging standards as hard as we can: the people who designed them had different world-views from some of us, but they weren't dumb. John Cowan http://vrici.lojban.org/~cowan xxxxxx@ccil.org Awk!" sed Grep. "A fscking python is perloining my Ruby; let me bash him with a Cshell! Vi didn't I mount it on a troff?" --Francis Turner