On Wed, Sep 25, 2019 at 3:58 PM Lassi Kortela <xxxxxx@lassi.io> wrote:

I'd just like to avoid arbitrary limits on the range of values within
each data type. Things like numerical limits, or limits on what
characters can appear in symbols. In order for the text format to have a
simple correspondence with its dual binary format, it should do a little
extra work to be able to represent things like symbols with weird chars.

All right.  The numerical stuff is only a warning anyway; I'm willing to make similar recommendations/warnings for the others.  "You can ignore this, but things may go wrong at the other end; there are no guarantees."  A similar recommendation that strings and symbols not be longer than 2^31-1 characters would be good as well.

The only thing that continues to trouble me is the symbol nil (case-insensitive).  The overload of #f and () is bad enough without a symbol that normally nobody ever uses *as* a symbol.

And the reason the text and binary formats should have 100% equal data
models, is simplicity for users - the proper aim of abstraction.

Agreed.
 
I'd like to have 100% text/binary equivalence and a rich set of data
types for the same reason. Just simple (read) and (write) with a choice
of text and binary, no need to be concerned with other options,

I continue to think that not letting (read) limit the amount of input is Very Bad Indeed.  Not all programming languages are memory safe, far from it.  Not even all Scheme or CL implementations if you set the compiler options correctly.
 
So I would like the formats to provide "mechanism, not policy".

I agree with this policy.  :-)
 
WouId it appease you if we include all the dodgy stuff but using
sharpsign syntax so they act like third-party user extensions instead of
bloating up the lexical core of the syntax?

Here's my current idea. 

First of all, I want a more compact syntax for bytevectors.  My current notion is for them to match/\[([0-9A-Fa-f][0-9A-Fa-f][-])*\]/.  That is, hex digits with optional hyphens between each byte so you can group things as you like, and then wrapped in square brackets.  I'm not particular about the square brackets.

After that, the content of each ASN.1 LER object is one of three things: bytes, characters, or sub-objects.  So let's write # followed by either a registered name or hex digits that represent the type code, followed by one of a string, a bytevector, or a list.  So a vector would be #vec(1 2 3) or #20(1 2 3), a duration would be #dur"1Y2M35D" or #1F22"1Y2M35D", and float 0.0 would be #float[0000-0000-0000-0000] or #DBt[0000-0000-0000-0000], although a decimal float would be more interoperable.  I have some registered names in the new column B of <http://tinyurl.com/asn1-ler>, but this would allow private-use typecodes, which don't have registered names, to be encoded as text.

To make this work on the procedure side, read can be passed a procedure that accepts a type code and a bytector/string/list and returns the proper internal representation; on the write side, it would accept an object and return two values, type code and bytevector/string/list.  The invocations would have to be bottom-up.


Comments?

(Arthur, have you looked at CapnProto?  Same guy, fixing the problems.)