I should start by mentioning the locations of my pre-pre-SRFIs for S-expression and binary interchange:

https://bitbucket.org/cowan/r7rs-wg1-infra/src/default/CoreSexps.md 

https://bitbucket.org/cowan/r7rs-wg1-infra/src/default/CoreAsn1.md  

For the record, I call them "mine" not because I own them, but because these are my proposals as distinct from competing proposals.  I'll be extending the ASN.1 to specify extended formats for covering more of the Lisp data types, but they'll be in their own section(s).

 

On Tue, Sep 24, 2019 at 5:29 AM Lassi Kortela <xxxxxx@lassi.io> wrote:

I would also like to remove all hints at numerical limits from even the
simplest specs. That makes life so much simpler, because any limits we
suggest are arbitrary and tied to the particular decade in which are
writing the spec.

In fact binary64 floats are extremely stable: the format has been with us since 1981 and standardized since 1985.  I'm not worried about them being replaced any time soon.   Similarly, though 128-bit integer libraries are at least as old as the VAX, hardware support (which was an add-on anyway) seems to have died with that platform.  I think the 64-bit recommendation is fine.

Implementors normally have a particular set of number
types to work with anyway, handed to them by the C compiler or Lisp
system; nothing we can write in the spec will change that set of types.

Just so, but what we can do is to set expectations.  JSON writers, for example, can write out numbers with any magnitude or precision they wish, but they are warned by the RFC that they cannot count on the receiver interpreting them correctly if they exceed the limits of a binary64 float.  Analogously, Unicode processes are not required to normalize their outputs; what they are not allowed to do is to use the distinction between normalized and unnormalized formats to carry out-of-band information.  (I'm speaking of C and D normalization here, KC and KD are a different beast that does throw away information.)

IMHO we should take a page from "The Right Thing" here, and specify it
so the interface is simple at the expense of implementation complexity.
So bignums are encoded exactly the same way as fixnums, using decimal
digits.

I don't disagree, which is why I worded the above quite carefully.
 
This syntax is quite nice, but I'd think about it some more. In
particular, Racket already has a different read syntax for hash-tables,
as does Clojure's EDN.

I'm fine with that.  I just hastily wrote in the simplest thing that could possiblly work, but I'd rather follow precedent.  There's none in any Scheme standard nor yet in Common Lisp (there is a proposal for #H, but I don't think it's gone much past the proposer).  It's too easy to roll your own in CL and not worry about compatibility.
 
With curly braces, there's also the usual problem with sets vs maps.
Braces naturally represent both, so a fight ensues, and no solution is
typically ideal.

While in a sense sets are more fundamental, mappings being a set of key-value pairs with a constraint, maps are in far wider use.  I do like the Python method (if it has colons, it's a dictionary) but it requires a special case for the empty set: there is no lexical syntax, you have to call a procedure.
 
That being said, we should ship a standard test suite so new
implementations can be verified quickly. Code written in half an hour is
generally not bulletproof :)

Of course.  If there are no tests, there is no implementation (no smiley).
 
> So I think it should be "compatible with s-expressions" for *human*
> purposes (not needing to learn a new language), and perhaps to allow the
> s-expression syntax of RnRS to become a superset of it in time (we can't
> back-fill a written syntax for hash tables into R7RS now, alas)

The very last project on the R7RS-large list is a comprehensive extension of Scheme's lexical syntax, but it can't be done untl we know what all the datatypes are.
 
> So my suggestion would be:
>
> 1) Take the s-expression syntax from R7RS, which IIRC has no remote code
> execution defined in the standard (as opposed to CL's); but remove the '
> ` , ,@ syntactic sugars that just expand into (quote ...) and friends
> anyway.

Too much.  As just one example, many languages no longer distinguish between characters and 1-character strings, and the character syntax is fugly because it has no uniform delimiter, only single characters or names of varying lengths.  My approach was data-centric: what data do we need to generally interchange, and then how should core S-expressions do it?  I still prefer that.

I also like #t and #f (which are also in John's current spec). It's not
ideal but the alternatives are much worse. I just implemented the binary
sexp library for Common Lisp and the NIL/()/false problem came out
immediately. It's always nice not to have to go there :)

Yes, CL will insist on mixing up () and #f, so there needs to be some way of dictating in each particular case which should be written and possibly which was read.  Continuable exceptions triggered by an optional argument are probably the right thing here.
 
This is a hard problem. Dotted pairs look and act a bit dodgy, but on
the other hand, a cons cell can be considered a fundamental building
block of hierarchical information, with not too much hyperbole.

Again, I don't agree; improper lists are mostly implementation fallout from a day when core was expensive.
 
Dotted pairs also make interoperability with just about every
non-Lisp/non-functional language harder, since those seldom have native
cons cells. In my Python reader, I just read successive conses into a
Python list and raise an error when encountering an improper list.
That's not too bad, since the writer can opt to not send any improper
lists, but it was the only tricky part in the reader.

I think that's exactly the Right Thing.
 
Could we leave them out at first, write a few programs that use core
sexps, and find out if we miss them?

Sure.  My original idea was to represent dictionaries in alist syntax (as opposed to allowing the representation of actual alists), making sure that a -> (b c d) is always written out as (a . (b c d)) and not (a b c d).   Overall though I like special dictionary syntax better.

Improper lists will be in the extended ASN.1 format; the last element is always the tail.
 
> 2) Add syntax for arbitary types, perhaps of the form #NAME{ ... };
> where NAME is a registry extended via SRFIs... hash tables are
> important/common enough to claim the empty NAME and be written as #{key
> val key val}, time objects can get #time{TYPE SECONDS NANOSECONDS}, etc.

I have to think about this further.  I don't think it is required for the MVP.  I'd be okay with a special case for ISO 8601 timestamps, since they cannot be mistaken for either numbers or identifiers.
 
> How can we tell if an alist is an alist when writing?

Quite so.  I was talking about writing out dictionaries *as* alists per the above.

Lassi's current implementation allows vertical bars.  I am against these because a lot of systems can't handle arbitrary characters in their identifier-analogues, so that schema languages and code generators for static languages become hard.  Indeed, vertical bars are older in Lisp than strings: in Maclisp you see code like (print '|fool, fool, back to the beginning is the rule|) because there we no strings yet.  I think strings were designed to make such symbols unnecessary and that's why they weren't in Scheme until R6RS.

Finally, we may be able to get away without limits on S-expression size, but disallowing them in binary is an invitation to be DOSed.  I think size (bytes or objects) and depth should both be limitable.


John Cowan          http://vrici.lojban.org/~cowan        xxxxxx@ccil.org
After fixing the Y2K bug in an application:
        WELCOME TO <censored>
        DATE: MONDAK, JANUARK 1, 1900