>> https://bitbucket.org/cowan/r7rs-wg1-infra/src/default/CoreSexps.md is the >> next stab at core S-expressions. Thank you for writing it up. Did I review it already? Here is some riffing on Alaric's comments: > I'd be inclined to remove the thing that numbers outside of ranges may > not interoperate. I would also like to remove all hints at numerical limits from even the simplest specs. That makes life so much simpler, because any limits we suggest are arbitrary and tied to the particular decade in which are writing the spec. Implementors normally have a particular set of number types to work with anyway, handed to them by the C compiler or Lisp system; nothing we can write in the spec will change that set of types. Concretely, I'd remove all mentions of "may not interoperate". > 1. How SHOULD one represent arbitrary numbers when they crop up in the > problem domain, then? Define a bignum format as a list of 64-bit > integers and have code to convert between them and proper numbers? Ugh! > > 2. People will forget about the restriction when using systems that > support bignums, which will work happily in their testing, but break in > undefined ways when interoperated with arbitrary third-party systems. Ugh! I agree with both ughs :) IMHO we should take a page from "The Right Thing" here, and specify it so the interface is simple at the expense of implementation complexity. So bignums are encoded exactly the same way as fixnums, using decimal digits. > Now, given that CoreSexps adds a new syntax #{ ... }, This syntax is quite nice, but I'd think about it some more. In particular, Racket already has a different read syntax for hash-tables, as does Clojure's EDN. With curly braces, there's also the usual problem with sets vs maps. Braces naturally represent both, so a fight ensues, and no solution is typically ideal. > I don't think > there's any point in trying to make it "compatible" with (read) on any > existing Lisp by avoiding syntax that "might cause problems"; arbitrary > data shouldn't be fed into (read) in most cases due to syntax in very > many lisp implementations that can execute arbitrary code! I agree with this stance. It's nice to be compatible where possible (so that core S-exps can be fed to "read" when working with known files you happend to have at hand). But for systems dealing with unknown data, under no circumstances would I recommend using (read) to read core sexps. I mean, our own reader is going to be like 100 lines of Lisp. We can just ship it as a library for every dialect. For the binary varint sexps, it took half an hour to write a library for a new language! I'd be surprised if textual ones take more than an hour per language. That being said, we should ship a standard test suite so new implementations can be verified quickly. Code written in half an hour is generally not bulletproof :) > So I think it should be "compatible with s-expressions" for *human* > purposes (not needing to learn a new language), and perhaps to allow the > s-expression syntax of RnRS to become a superset of it in time (we can't > back-fill a written syntax for hash tables into R7RS now, alas) so that > CoreSexp literals can be written as-is in RnRS programs. But trying to > find a lowest common denominator of s-expression syntaxes is, I think, a > flawed approach, even if we then didn't leap straight out of that subset > by extending it with #{ ... }! Very well thought out. I completely agree with all points. > So my suggestion would be: > > 1) Take the s-expression syntax from R7RS, which IIRC has no remote code > execution defined in the standard (as opposed to CL's); but remove the ' > ` , ,@ syntactic sugars that just expand into (quote ...) and friends > anyway. I also like #t and #f (which are also in John's current spec). It's not ideal but the alternatives are much worse. I just implemented the binary sexp library for Common Lisp and the NIL/()/false problem came out immediately. It's always nice not to have to go there :) > 1a) I'm not sure if we should remove improper lists from the syntax... > It would be nice to be able to have non-Lisp implementations of this > model able to assume that lists are proper lists and can map to their > own list types. This is a hard problem. Dotted pairs look and act a bit dodgy, but on the other hand, a cons cell can be considered a fundamental building block of hierarchical information, with not too much hyperbole. Dotted pairs also make interoperability with just about every non-Lisp/non-functional language harder, since those seldom have native cons cells. In my Python reader, I just read successive conses into a Python list and raise an error when encountering an improper list. That's not too bad, since the writer can opt to not send any improper lists, but it was the only tricky part in the reader. Could we leave them out at first, write a few programs that use core sexps, and find out if we miss them? > 2) Add syntax for arbitary types, perhaps of the form #NAME{ ... }; > where NAME is a registry extended via SRFIs... hash tables are > important/common enough to claim the empty NAME and be written as #{key > val key val}, time objects can get #time{TYPE SECONDS NANOSECONDS}, etc. I'd also tentatively vote for #name. We're going to need full words after the sharpsign -- one letter won't cut it :) > 3) Define an SRFI with "safe" read and write procedures that read and > write exactly this language, and also with a procedure to register > arbitrary type readers/writers so the arbitrary type list can be > extended by portable SRFI implementations. > > Other languages can have their own implementations like that SRFI, doing > their best to map from our types into theirs. +1 >>> Here are my suggestions for rock-bottom S-expressions: >>> >>> Proper lists as we know them. They might turn into vectors in non-Lisp >>> systems. >>> >>> Alists as we know them. They might turn into hashtables or dictionaries >>> in non-Lisp systems. We always format an alist element (1 2 3) as (1 . (2 >>> 3)). > > How can we tell if an alist is an alist when writing? It's all just cons > cells and atoms... I'd prefer to use hash tables here, which can be > unambigously detected and written as #{ ... } under CoreSexps syntax. I would leave out all the magic about special handling for alists. That's domain knowledge, higher-level than this encoding IMHO.