Re: Limits, symbols and bytevectors, ASN.1 branding

Show/hide message thread

Core lexical syntax Lassi Kortela (25 Sep 2019 10:15 UTC)
Re: Core lexical syntax John Cowan (25 Sep 2019 14:09 UTC)
Machines vs humans Lassi Kortela (25 Sep 2019 14:25 UTC)
Re: Core lexical syntax Alaric Snell-Pym (25 Sep 2019 15:44 UTC)
Re: Core lexical syntax John Cowan (25 Sep 2019 19:18 UTC)
Mechanism vs policy Lassi Kortela (25 Sep 2019 19:58 UTC)
Re: Mechanism vs policy Arthur A. Gleckler (25 Sep 2019 21:17 UTC)
Re: Mechanism vs policy Lassi Kortela (26 Sep 2019 07:40 UTC)
Re: Mechanism vs policy John Cowan (25 Sep 2019 22:25 UTC)
Re: Mechanism vs policy Arthur A. Gleckler (26 Sep 2019 01:34 UTC)
Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (26 Sep 2019 08:23 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Alaric Snell-Pym (26 Sep 2019 08:56 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 02:38 UTC)
ASN.1 branding Lassi Kortela (27 Sep 2019 14:56 UTC)
Re: ASN.1 branding Alaric Snell-Pym (27 Sep 2019 15:24 UTC)
Re: ASN.1 branding Lassi Kortela (27 Sep 2019 18:54 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 01:57 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (27 Sep 2019 16:24 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 17:37 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (27 Sep 2019 18:28 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 18:39 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (27 Sep 2019 18:46 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 21:19 UTC)
Re: Mechanism vs policy Alaric Snell-Pym (26 Sep 2019 08:45 UTC)
Implementation limits Lassi Kortela (26 Sep 2019 08:57 UTC)
Re: Implementation limits Alaric Snell-Pym (26 Sep 2019 09:09 UTC)
Re: Implementation limits Lassi Kortela (26 Sep 2019 09:51 UTC)
Meaning of the word "format" Lassi Kortela (26 Sep 2019 10:31 UTC)
Stacking it all up Lassi Kortela (26 Sep 2019 11:05 UTC)
Brief spec-writing exercise Lassi Kortela (26 Sep 2019 11:46 UTC)
Re: Brief spec-writing exercise John Cowan (26 Sep 2019 15:45 UTC)
Standards vs specifications Lassi Kortela (26 Sep 2019 21:24 UTC)
Re: Standards vs specifications John Cowan (27 Sep 2019 04:29 UTC)
Re: Standards vs specifications Lassi Kortela (27 Sep 2019 13:47 UTC)
Re: Standards vs specifications John Cowan (27 Sep 2019 14:53 UTC)
Re: Meaning of the word "format" John Cowan (26 Sep 2019 20:59 UTC)
Re: Meaning of the word "format" Lassi Kortela (26 Sep 2019 21:09 UTC)
Re: Meaning of the word "format" John Cowan (27 Sep 2019 02:44 UTC)
Length bytes and lookahead in ASN.1 Lassi Kortela (27 Sep 2019 13:58 UTC)
Re: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 14:22 UTC)
Re: Length bytes and lookahead in ASN.1 Alaric Snell-Pym (27 Sep 2019 15:02 UTC)
Re: Length bytes and lookahead in ASN.1 hga@xxxxxx (27 Sep 2019 15:26 UTC)
(missing)
Fwd: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 16:40 UTC)
Re: Fwd: Length bytes and lookahead in ASN.1 Alaric Snell-Pym (27 Sep 2019 16:51 UTC)
Re: Fwd: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 17:18 UTC)
Length bytes and lookahead in ASN.1 hga@xxxxxx (27 Sep 2019 16:58 UTC)
Re: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 17:21 UTC)
Re: Mechanism vs policy John Cowan (27 Sep 2019 03:52 UTC)
Re: Core lexical syntax Alaric Snell-Pym (26 Sep 2019 08:36 UTC)
Re: Core lexical syntax John Cowan (25 Sep 2019 14:13 UTC)

Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela 27 Sep 2019 16:24 UTC

>> Perhaps the CL writer should write out NIL and T as something like
>> #!cl:nil and #!cl:t.
>
> The CL writer isn't the problem: it's the CL reader [...] could return
> one or two artificial objects, but the result would just be an
> annoyance to the CL programmer.

I don't see a way to escape the problem either. The phrase "lost in
translation" comes to mind.

What the reader CL can have is a parameter to pick which representations
of ()/NIL/false/null will returned at each call to (read). Since we ship
the reader in our own library, it's not much effort to add options.

> What's wrong with the symbol cluser:foobar on the wire?

It's fine with me :) I thought you didn't like it. Any symbol notation
is probably fine with me, so long as some equivalent of vertical-bar
notation is available even in the simplest syntax so arbitrary symbol
names can be sent.

> The CL reader can
> either boot if it finds an unknown package, as CL's native reader does, or
> create it on the fly with make-package.

Yes. The reader might also need user choice for what to do upon
encountering a non-existent package.

> There is an issue that cluser:foobar, |cluser:foobar|, |cluser|:foobar,
> cluser:|foobar|, and |cluser|:|foobar| all mean different things.

IMHO we should dictate that symbols are always case-sensitive.
Converting bare symbols to uppercase is anachronisic, and converting
them to lowercase is not really necessary.

Case-sensitive symbols mean |cluser|:foobar and cluser:|foobar| and
|cluser|:|foobar| are equal.

I would read |cluser:foobar| as a symbol named "cluser:foobar" in the
default package (i.e. the symbol has no package prefix).

> My
> personal attitude to that problem is "Who cares?"  AFAIK we are not trying
> to provide serializations for all possible CL data structures.  Let the CL
> community do that work, and if it fits into our framework (very unlikely,
> politically), so much the better.

I still visit CL regularly and try hard to reduce unnecessary
divergence. Lisp needs people who go the extra mile to reach across
dialect boundaries.

Separately from that issue, a practical serialization of all CL data is
probably an intractable problem even with the best of intentions. But we
can provide enough extensibility that people can try, and gradually add
more types. Those should be identical to Scheme types where it makes sense.

> In any case, anyone who quotes : or uses
> packages that differ only in case deserves to lose.

Case-sensitive symbols and/or vertical-bar notation (or equivalent, e.g.
symbol name as double-quoted string with a special # prefix) should
solve that problem easily.

Even CL symbols are case-sensitive under the hood, as you know, so the
only thing that folds case is the CL reader with its default settings
(and a diminishing number of Scheme readers).

>> Uninterned symbols could be #package-symbol{#!null FOOBAR} or
>> #uninterned-symbol{FOOBAR}. Some Schemes also have uninterned symbols so
>> a common solution needs to be found.
>
> We'll discuss that, but I am not a fan of it.  Most Schemes won't have any
> such thing, and anyway, different instances of #:foo are all different, so
> they might as well be strings.

Again, I'll argue that Lisp can simply give you uninterned symbol from
somewhere. It's easier to support them as an extension than have people
filter them out from all data they ever write.

If you want to banish these weird symbols behind a # extension instead
of convoluting the format's basic lexical syntax: enthusiastically agreed.

>>> First of all, I want a more compact syntax for bytevectors.  My current
>>
>> notion is for them to match/\[([0-9A-Fa-f][0-9A-Fa-f][-])*\]/.

All of that is good.

What about using underscore as the digit separator? Dash brings to mind
subtraction and Lisp symbols / Scheme identifiers, though there is
probably no serious risk of confusion.

> Base64 fails on both human-readability and bit compactness.  The *only*
> reason it exists at all is because email channels were originally 7-bit
> only.   Indeed, they still are unless both ends negotiate otherwise.

Well, I hate to admit this but base64 is still convenient. One of the
latest examples is storing image files in data: URIs.

Still no strong opinion on whether base64 or only hex. I'll defer to others.

>> What's your opinion of simply using strings for the hex? #u8"abcdef1234"
>
> I want a stand-alone way to do bytevectors that doesn't involve #name
> prefixes, because I want to use is as part of #name notations.

Ah, ok. I'll wait for your #name syntax before evaluating the #u8"".

> ASN.1 standard floats (nobody uses them AFAIK) can handle exponent bases of
> 2, 8, 10, or 16.  But the normal textual notaton of floats, like "3.1415",
> should correspond to the normal binary notation, IEEE binary64.  I'll worry
> about a specialized type for decimal floats when some Lisp provides them.

OK, maybe loss of float precision is fine. Caring about bit-identical
floats is an anti-pattern (though I expect there are some fringe
applications where that kind of expediency is needed).

Then it's probably also fine to convert freely between binary floats
(for the binary format) and decimal floats (for the text format).

The big concern is representing exact quantities of money. Databases
have had to deal with this problem for ages; what do they do?

> There are libraries for Python, R, C/C++, and a few other languages, but no
> one has the integrated into the language's numeric tower (Python comes
> closest, but not very close).  Hardware support is caught in a
> chicken-and-egg problem: langages don't support them, so chips don't
> implement them (except for the IBM POWER and the z/Series mainframes), so
> languages don't support them.  They have been characterized as a solution
> in search of a problem.

Interesting.

>> The problem with the [0000-0000] encodings is that we need to introduce
>> extra square-bracket lexical syntax for something that could already be
>> represented as a string: "0000-0000".
>
> I want to distinguish in the text format between ASN.1 types that are
> basically ASCII and those that are basically binary.  Bytevectors vs.
> strings is the natural way to do that.

Sure. I meant something like #u8"1234abcd", i.e. a string prefixed by a
tag. But I'll wait for your bytevectors in #names.