Limits, symbols and bytevectors, ASN.1 branding

Show/hide message thread

Core lexical syntax Lassi Kortela (25 Sep 2019 10:15 UTC)
Re: Core lexical syntax John Cowan (25 Sep 2019 14:09 UTC)
Machines vs humans Lassi Kortela (25 Sep 2019 14:25 UTC)
Re: Core lexical syntax Alaric Snell-Pym (25 Sep 2019 15:44 UTC)
Re: Core lexical syntax John Cowan (25 Sep 2019 19:18 UTC)
Mechanism vs policy Lassi Kortela (25 Sep 2019 19:58 UTC)
Re: Mechanism vs policy Arthur A. Gleckler (25 Sep 2019 21:17 UTC)
Re: Mechanism vs policy Lassi Kortela (26 Sep 2019 07:40 UTC)
Re: Mechanism vs policy John Cowan (25 Sep 2019 22:25 UTC)
Re: Mechanism vs policy Arthur A. Gleckler (26 Sep 2019 01:34 UTC)
Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (26 Sep 2019 08:23 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Alaric Snell-Pym (26 Sep 2019 08:56 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 02:38 UTC)
ASN.1 branding Lassi Kortela (27 Sep 2019 14:56 UTC)
Re: ASN.1 branding Alaric Snell-Pym (27 Sep 2019 15:24 UTC)
Re: ASN.1 branding Lassi Kortela (27 Sep 2019 18:54 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 01:57 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (27 Sep 2019 16:24 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 17:37 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (27 Sep 2019 18:28 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 18:39 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (27 Sep 2019 18:46 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 21:19 UTC)
Re: Mechanism vs policy Alaric Snell-Pym (26 Sep 2019 08:45 UTC)
Implementation limits Lassi Kortela (26 Sep 2019 08:57 UTC)
Re: Implementation limits Alaric Snell-Pym (26 Sep 2019 09:09 UTC)
Re: Implementation limits Lassi Kortela (26 Sep 2019 09:51 UTC)
Meaning of the word "format" Lassi Kortela (26 Sep 2019 10:31 UTC)
Stacking it all up Lassi Kortela (26 Sep 2019 11:05 UTC)
Brief spec-writing exercise Lassi Kortela (26 Sep 2019 11:46 UTC)
Re: Brief spec-writing exercise John Cowan (26 Sep 2019 15:45 UTC)
Standards vs specifications Lassi Kortela (26 Sep 2019 21:24 UTC)
Re: Standards vs specifications John Cowan (27 Sep 2019 04:29 UTC)
Re: Standards vs specifications Lassi Kortela (27 Sep 2019 13:47 UTC)
Re: Standards vs specifications John Cowan (27 Sep 2019 14:53 UTC)
Re: Meaning of the word "format" John Cowan (26 Sep 2019 20:59 UTC)
Re: Meaning of the word "format" Lassi Kortela (26 Sep 2019 21:09 UTC)
Re: Meaning of the word "format" John Cowan (27 Sep 2019 02:44 UTC)
Length bytes and lookahead in ASN.1 Lassi Kortela (27 Sep 2019 13:58 UTC)
Re: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 14:22 UTC)
Re: Length bytes and lookahead in ASN.1 Alaric Snell-Pym (27 Sep 2019 15:02 UTC)
Re: Length bytes and lookahead in ASN.1 hga@xxxxxx (27 Sep 2019 15:26 UTC)
(missing)
Fwd: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 16:40 UTC)
Re: Fwd: Length bytes and lookahead in ASN.1 Alaric Snell-Pym (27 Sep 2019 16:51 UTC)
Re: Fwd: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 17:18 UTC)
Length bytes and lookahead in ASN.1 hga@xxxxxx (27 Sep 2019 16:58 UTC)
Re: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 17:21 UTC)
Re: Mechanism vs policy John Cowan (27 Sep 2019 03:52 UTC)
Re: Core lexical syntax Alaric Snell-Pym (26 Sep 2019 08:36 UTC)
Re: Core lexical syntax John Cowan (25 Sep 2019 14:13 UTC)

Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela 26 Sep 2019 08:23 UTC

>> And the reason the text and binary formats should have 100% equal data
>> models, is simplicity for users - the proper aim of abstraction.
>
> Agreed.
>
>> So I would like the formats to provide "mechanism, not policy".
>
> I agree with this policy.  :-)

We are in agreement about the most important points :)

I value these discussions a lot. They take a lot of time and energy, but
few complex inventions bear fruit without getting a number of people in
agreement. It's just not possible to do much for the world by oneself.

> All right.  The numerical stuff is only a warning anyway; I'm willing to
> make similar recommendations/warnings for the others.  "You can ignore
> this, but things may go wrong at the other end; there are no guarantees."
> A similar recommendation that strings and symbols not be longer than 2^31-1
> characters would be good as well.
>
> The only thing that continues to trouble me is the symbol nil
> (case-insensitive).  The overload of #f and () is bad enough without a
> symbol that normally nobody ever uses *as* a symbol.

Nil is indeed going to be a problem no matter what we do.

I already ported the current reader to Common Lisp
(https://github.com/lispunion/universal-encoding-cl), and plan to keep
the port up to date with our Scheme work, porting the binary format as well.

Perhaps the CL writer should write out NIL and T as something like
#!cl:nil and #!cl:t. Or many kinds of symbols are problematic, there
could be a way to encode a symbol with a "From:" field, and the writer
would fill it in for the problematic ones. E.g. #symbol-from{cl NIL} and
#symbol-from{cl T}. Something like that.

As we discussed, CL symbols also have packages. I think they should be
representable on "throw some code into the format at the end of a tired
workday" grounds. But again it can be a special field:
#package-symbol{"CL-USER" FOOBAR}.

Uninterned symbols could be #package-symbol{#!null FOOBAR} or
#uninterned-symbol{FOOBAR}. Some Schemes also have uninterned symbols so
a common solution needs to be found.

Keywords could be :keyword or #keyword{FOOBAR}. The :keyword syntax is
nice but it's a bit misleading because in CL, packaged symbols are
package:symbol. That looks as if keywords are in the package whose name
is "". But they are actually in a package named "KEYWORD".

The Scheme writer can just write #t and #f (or #!true and #!false or
whatever we pick, as well as ordinary booleans for the binary format).

> I continue to think that not letting (read) limit the amount of input is
> Very Bad Indeed.  Not all programming languages are memory safe, far from
> it.  Not even all Scheme or CL implementations if you set the compiler
> options correctly.

Agreed. I was probably unclear: options are great, but they should be
options, not required :) If the format can represent anything, that lets
us offer simple (read) and (write). If the reader and writer can take
lots of options via some unobtrusive means like keyword arguments,
that's a good thing.

Limits on recursion depth, number size, etc. are probably good for
production apps. For example, PHP's standard JSON parser has a depth
limit: <https://www.php.net/manual/en/function.json-decode.php>.

> Here's my current idea.
>
> First of all, I want a more compact syntax for bytevectors.  My current
> notion is for them to match/\[([0-9A-Fa-f][0-9A-Fa-f][-])*\]/.  That is,
> hex digits with optional hyphens between each byte so you can group things
> as you like, and then wrapped in square brackets.  I'm not particular about
> the square brackets.

I like the hex digits thing. I might even go with the base64; quite
neutral on it.

What's your opinion of simply using strings for the hex? #u8"abcdef1234"

> After that, the content of each ASN.1 LER object is one of three things:

This comes across like nitpicking but there's a deeper point behind it:
I'd change the name "LER" to something else. I know it's consistent with
other ASN.1 names like BER and DER, but to a normal person those are
hard to remember and (correctly) give a bureaucratic impression. If you
want to spread ASN.1 to mortals, every aspect of it needs to be made
more approachable than it currently is, even if it means breaking with
convention. One natural name would be "Lisp ASN.1".

The name "ASN.1" itself is oddly cool for something that was presumably
birthed in large conference rooms. It sounds like Formula 1, itself a
cool name for an auto-racing contest. They did something right :)

> bytes, characters, or sub-objects.  So let's write # followed by either a
> registered name or hex digits that represent the type code, followed by one
> of a string, a bytevector, or a list.  So a vector would be #vec(1 2 3) or
> #20(1 2 3), a duration would be #dur"1Y2M35D" or #1F22"1Y2M35D", and float
> 0.0 would be #float[0000-0000-0000-0000] or #DBt[0000-0000-0000-0000],
> although a decimal float would be more interoperable.  I have some
> registered names in the new column B of <http://tinyurl.com/asn1-ler>, but
> this would allow private-use typecodes, which don't have registered names,
> to be encoded as text.

I like this. I don't mind if vectors are #vec(...) instead of #(...).

Both binary and decimal floats would be nice to have. It's good to have
binary floats in the text format too, since the binary format has them.
Likewise, people are going to write 123.45 in the text format, so it's
good to have decimal floats in the binary format.

The problem with the [0000-0000] encodings is that we need to introduce
extra square-bracket lexical syntax for something that could already be
represented as a string: "0000-0000". 0000-0000 could also be lexed as a
symbol, but if we want to forbid non-vertical-bar-escaped symbols that
start with a digit, that will present a problem.

> To make this work on the procedure side, read can be passed a procedure
> that accepts a type code and a bytector/string/list and returns the proper
> internal representation; on the write side, it would accept an object and
> return two values, type code and bytevector/string/list.  The invocations
> would have to be bottom-up.

LGTM.