Re: Request for review of my binary encoding proposal

Show/hide message thread

Request for review of my binary encoding proposal John Cowan (17 Sep 2019 22:39 UTC)

Re: Request for review of my binary encoding proposal Lassi Kortela (18 Sep 2019 00:35 UTC)

Re: Request for review of my binary encoding proposal Alaric Snell-Pym (18 Sep 2019 10:09 UTC)

Re: Request for review of my binary encoding proposal John Cowan (18 Sep 2019 23:48 UTC)

Re: Request for review of my binary encoding proposal Arthur A. Gleckler (18 Sep 2019 23:51 UTC)

Data type registry Lassi Kortela (19 Sep 2019 16:47 UTC)

Re: Data type registry John Cowan (19 Sep 2019 20:21 UTC)

Re: Data type registry Arthur A. Gleckler (19 Sep 2019 21:37 UTC)

Symbol registry Lassi Kortela (19 Sep 2019 21:46 UTC)

Re: Symbol registry Arthur A. Gleckler (19 Sep 2019 21:48 UTC)

Why ASN.1 is not, like, actually evil John Cowan (18 Sep 2019 12:24 UTC)

Re: Why ASN.1 is not, like, actually evil hga@xxxxxx (18 Sep 2019 13:43 UTC)

Re: Why ASN.1 is not, like, actually evil John Cowan (18 Sep 2019 21:13 UTC)

Re: Why ASN.1 is not, like, actually evil Lassi Kortela (19 Sep 2019 17:01 UTC)

Re: Why ASN.1 is not, like, actually evil John Cowan (19 Sep 2019 18:27 UTC)

Re: Why ASN.1 is not, like, actually evil Lassi Kortela (19 Sep 2019 21:53 UTC)

Re: Request for review of my binary encoding proposal John Cowan (18 Sep 2019 23:29 UTC)

Re: Request for review of my binary encoding proposal Lassi Kortela (19 Sep 2019 16:08 UTC)

Re: Request for review of my binary encoding proposal Alaric Snell-Pym 18 Sep 2019 10:08 UTC

Show/hide attachments

On 18/09/2019 01:35, Lassi Kortela wrote:
>> The only deviation from DER is that sets do not have to be sorted into
>> binary lexicographic order.
>
> This all sounds like a very good idea, as does ASN.1 in general.
> Unfortunately the devil is in the details: ASN.1 has a reputation as
> badly over-engineered (even when limited to its binary encodings, not
> the XML one), and there have been numerous bugs (many of them with
> security implications) in parsers in times past. I tried to understand
> the format once but it was so complex for what it did that I gave up.

Heheh, ASN.1 is IMHO an interesting thing to study and learn from and
re-use some of the best bits rather then using itself... I was on the
ASN.1 ISO/ITU-T working group for a while (during the development of
XER), and I felt there was perhaps a bit of pressure from WG members who
had successful businesses selling ASN.1 tools to keep it complicated so
that tools would be required :-) There was some talk of trying to trim
down a useful subset that could be implemented without ("SGML is to XML
as ASN.1 is to... ?"), but nothing came of it!

> So a Scheme implementation may be a very good idea if we need it for a
> paricular purpose (and someone volunteers to write all that code). But
> if we don't, then I'd recommend favoring simpler protocols. If ASN.1 is
> needed, it may be worth to considering wrapping one of the many C
> implementations.

These days, I don't see ASN.1 being proposed for new things; I've seen
implementations of it done pretty much purely to talk protocols that
already use it (X.509 / SNMP).

> Your list of data types looks good if a comprehensive format is needed.
> But I would leave out all of the space-optimized ones unless someone has
> measured that some specific task is too slow. I'd go with varints for
> all numbers. For space savings without performance penalty, the recent
> crop of fast compression algorithms (LZ4, Zstandard, Snappy) is amazing.

<aside>As an aside (please take it offlist if anyone wants to talk about
this!) I'm very interested in designing data encodings that take account
of, or indeed directly drive, compression algorithms rather than being
defined purely as a string of bytes - I blogged about this:
http://www.snell-pym.org.uk/archives/2006/08/24/splay-trees-compression-encryption-and-embedding/3/
</aside>

> I don't have a good opinion on the bag, mapping, range etc. types. My
> intuition favors simplicity, because the value of data is in exchange,
> and the more intricate a format is, the fewer environments it can be
> exchanged with. Porting S-expressions to a new environment, you have to
> implement lists, symbols, strings and integers. Every type you add to
> that means more porting work, which usually means people bother to port
> to fewer environments. It may make sense for some applications to add
> more data types if they are really useful. But generally I'd bet against
> it (no concrete arguments here, just intuition). For anything subject to
> network effects, a bigger network adds value much faster than technical
> sophistication, as Lispers are all too aware.

I think a general scheme-value interchange format needs to have some
idea of how common SRFI abstract types are encoded - from timestamps to
hash tables and so on. Not even s-expressions really have that! Perhaps
is the Scheme community takes data portability more seriously, it will
become the norm for the SRFIs for those types of things to also define
s-expression and binary encodings, and the reference implementations
will be expected to register codecs for s-expression and binary
representations of them (with the thorny issue of how to ensure there's
no conflicts in the "type codes" used).

--
Alaric Snell-Pym   (M7KIT)
http://www.snell-pym.org.uk/alaric/