Re: Request for review of my binary encoding proposal

Show/hide message thread

Request for review of my binary encoding proposal John Cowan (17 Sep 2019 22:39 UTC)

Re: Request for review of my binary encoding proposal Lassi Kortela (18 Sep 2019 00:35 UTC)

Re: Request for review of my binary encoding proposal Alaric Snell-Pym (18 Sep 2019 10:09 UTC)

Re: Request for review of my binary encoding proposal John Cowan (18 Sep 2019 23:48 UTC)

Re: Request for review of my binary encoding proposal Arthur A. Gleckler (18 Sep 2019 23:51 UTC)

Data type registry Lassi Kortela (19 Sep 2019 16:47 UTC)

Re: Data type registry John Cowan (19 Sep 2019 20:21 UTC)

Re: Data type registry Arthur A. Gleckler (19 Sep 2019 21:37 UTC)

Symbol registry Lassi Kortela (19 Sep 2019 21:46 UTC)

Re: Symbol registry Arthur A. Gleckler (19 Sep 2019 21:48 UTC)

Why ASN.1 is not, like, actually evil John Cowan (18 Sep 2019 12:24 UTC)

Re: Why ASN.1 is not, like, actually evil hga@xxxxxx (18 Sep 2019 13:43 UTC)

Re: Why ASN.1 is not, like, actually evil John Cowan (18 Sep 2019 21:13 UTC)

Re: Why ASN.1 is not, like, actually evil Lassi Kortela (19 Sep 2019 17:01 UTC)

Re: Why ASN.1 is not, like, actually evil John Cowan (19 Sep 2019 18:27 UTC)

Re: Why ASN.1 is not, like, actually evil Lassi Kortela (19 Sep 2019 21:53 UTC)

Re: Request for review of my binary encoding proposal John Cowan (18 Sep 2019 23:29 UTC)

Re: Request for review of my binary encoding proposal Lassi Kortela (19 Sep 2019 16:08 UTC)

Re: Request for review of my binary encoding proposal Lassi Kortela 19 Sep 2019 16:08 UTC

>> If ASN.1 is
>> needed, it may be worth to considering wrapping one of the many C
>> implementations.
>
> You mean those horribly buggy, schema-dependent, statically typed,
> compile-time-specialized implementations?  Include me out!

I was thinking "those already-done implementations". If you want rich
data types, it's probably better to roll a new one in Scheme as you
suggest. I'm under the impression most of the existing stuff is SNMP,
certificates, audio/video and the like.

> I think we do need a comprehensive format.  Part of the goal of R7RS-large
> is to make a whole new crop of standard dateatypes all equal in the eyes of
> a Scheme programmer.  Lisp programmers have always said things like: "Well,
> it would be faster and better to use vectors here, but the vector
> procedures in R7RS-small (which has more than any of its predecessors) are
> too impoverished compared to SRFI 1, so I'll use lists" (and similarly for
> CL).  I want Scheme programmers to stop making those choices.  SRFI 133
> gives vectors substantial parity with lists, and including it in R7RS-large
> means that implementations will have them either as "batteries included" or
> as readily loadable from Snow etc., and as such, so that you can always
> reach for the right tool and not the most convenient one, because they are
> all equally convenient.

I like that goal. Haven't thought about it in relation to external
representations of those objects.

> Part of that inclusiveness of types is to be able to serialize *all* our
> data structures, not just the simple int/float/stirng/boolean ones.
> Backward compatibility is going to make it difficult to do that with
> S-expressions, though I have some ideas for portable extensions to the
> S-expression format (more on that later).

That's an interesting point. You're basically talking about something
like Python's pickle format, but with more data types. I think there are
three different jobs here:

1) Cross-language interchange. Simplicity wins the day. Things like JSON.

2) Saving all representable data using some close-enough representation
(rounding floats is ok, converting between lists/vectors is ok, etc.)
The S-expression syntax of most Lisps is quite like this.

3) Saving all representable data using the exact same representation
(floats remain bit-accurate, bitvectors remain bitvectors, hashtables
remain hashtables, etc.) Your ASN.1 extensions aim here.

All of those are on the wish list for most tasks, and we make different
compromises.

> By no means all Schemes use a
> readtable approach as CL does, and a lot of implementations of `read` would
> be quite hard to extend.  Interchanging a simple binary format makes things
> much more straightforward, as I pointed out in my not-evil post.

Fully agreed. This is one of the main reasons I favor binary formats in
general, and especially for IPC where the human is out of the loop.

> f32vectors aren't there because they are space-optimized, but because they
> are a data structure type of R7RS-large, per the vote on SRFI 160.  (Wihtin
> a Scheme program, they are there because they are space-optimized and quite
> possibly more efficient to work on, if your implementation optimizes
> f32vector-map for certain known functional arguments.)  Given that
> everything in an f32vector is an IEEE float, and that (r6rs bytevectors)
> aka (scheme vector), which as I mentioned has a portable implementation
> posted in the SRFI 4 repo, it's totally easy to convert from a pile of
> bytes in big-endian to an f32vector and back.  (Well, we need bulk
> converters, for which I don't yet have even a pre-SRFI, but I will; it
> belongs to the tedious-but-trivial category.)

Does R7RS require floats to be IEEE? If you want to pickle floats
bit-accurately, that may be a reasonable wish.

It is indeed easy to keep IEEE floats around. It's the total scope of
the spec that always worries me in situations like this, not really any
individual part.

> Varints are good for non-negative integers.  For other numbers, not so
> much.  There is actually a varint-like format for floats in ASN.1, but I
> left it out because interchanging in anything but IEEE format (which at
> most needs byte-swapping) is useless at this point.

It's not useless to use a format more general than IEEE, but you may be
right that it's too fancy. I'll read up on how complex the ASN.1
var-floats are.

How to represent arbitrarily large ratios and complex numbers if not
using groups of varints?

> Part of our whole raison d'etre is *not* to do what everyone else does: if
> you want that, use what everyone else uses, to resist Worse is Better.

I see more shades in it. There are plenty of situations in which one can
do what everyone else does without compromising on one's values. In data
interchange that is often the case.