On Tue, Sep 17, 2019 at 8:35 PM Lassi Kortela <xxxxxx@lassi.io> wrote:

This all sounds like a very good idea, as does ASN.1 in general.
Unfortunately the devil is in the details:

I agree. But I will, I hope, explain why the devil in this case is not as black as he is usually painted. Tl;dr version: the characteristics of the Lisp languages eliminate most of the sources of the bugs that have plagued ASN.1 implementations in the past.

ASN.1 has a reputation as
badly over-engineered (even when limited to its binary encodings, not
the XML one), and there have been numerous bugs (many of them with
security implications) in parsers in times past.

Absolutely. And here's why:

1) ASN.1 was designed in an environment where all languages were statically typed. As such, it was necessary for the sender and the receiver to share a common understanding of exactly what could and what could not be communicated so that the messages could be contained in statically typed structures. Of course, the communication need not be so direct; it can be mediated by a file or a database entry or what not.)

2) This goal was achieved using ASN.1 schema language, which is a DSL that prescribes the exact structure of what the sender will send and the receiver will expect to receive. But broadly one party has to serialize data of a known structure and the other to deserialize it into the same structure, modulo the details of the languages used. The schema language allowed for unions, which gave messages a limited degree of flexibility, normally to be used only where absolutely necessary.

3) ASN.1 was also born into a world where bytes were incredibly expensive by modern standards, either to transmit or to store. That included bytes used for programs. Consequently, the expected approach was that ASN.1 schemas would be transmitted out of band, and a parser-generator at each end would read the schema and construct, for the appropriate language, data structures and serialization and/or deserialization code for that specific schema only.

4) These parser-generators were compilers in fact, though nobody thought of them that way. They were generally written both in and for unsafe languages such as C, and not with the same degree of caution that programmers working on gcc or clang are expected to supply. Consequently, the generated deserializer was often naive, believing that the lengths in its input actually made sense and that the structure of received data actually did match the schema. When one or both of these was not the case, either because the message was malicious or because the generated serializer was buggy, the deserializer was often brittle and did random things to memory. Security holes and crashes were the inevitable result.

5) ASN.1 BER, the most general binary type, is usually what ASN.1 compilers are targeted at. It has (IMO gratuitous) encoding variants that are excluded from DER, which is what I am using as the basis for LER. That adds to the complexity of deserializers, raising the chances of bugs.

But Lisp in the modern world? Ho, a whole different matter! Not only does the industry as a whole transmit and store bytes insanely fast and cheaply by old standards, but:

1) ASN.1 DER is a very simple recursive type in essence: an object is represented by a type code followed by a length followed by a value, which can either be raw binary or a sequence of encoded sub-objects. Lisp programmers are good, to say the least, at handling recursive types and the resulting recursive data structures.

2) Because our language is dynamically typed, we can have *one* serializer and *one* deserializer, both of which can handle whatever structure you throw at them and construct the corresponding structure on the other side. Each of them is basically a typecase statement with a little bit of logic for each case, almost exactly like read and write, only *much* simpler.

3) Our implementations are (at least by default) safe. No fandangos on core <http://www.catb.org/jargon/html/F/fandango-on-core.html> for us! At worst, if a message is too big or is not as big as it is clained to be, we get an out of memory exception or a hang on the port trying to read data that isn't there. The result of bad serialization may be semantic gibberish, but at least it won't damage the program; at the very worst, the program will fail with an exception. It's easy to build in defenses to limit the size of any object being received, and this should probably be a standard part of the API.

4) As a consequence of the above points, we need not write an ASN.1 schema compiler at all, though we have a better language to do it in if we wanted to. The most robust, reliable, and secure parts of any program are those that do not exist.

I hope this argument is more or less convincing. I'll address "why ASN.1?" and your specific points in a later post.

John Cowan http://vrici.lojban.org/~cowan xxxxxx@ccil.org
On the Semantic Web, it's too hard to prove you're not a dog.
--Bill de hOra