On Thu, Sep 19, 2019 at 1:01 PM Lassi Kortela <xxxxxx@lassi.io> wrote:
 
I don't have a good grasp on how complex ASN.1 is to implement if you
leave out all the schema stuff and pick suitably simple encoding rules.
It'd be interesting to see a prototype implementation.

Not up for that right now, but here are a few sample serializations (if I haven't screwed up)

5 => 02 (fixed integer) 01 (length) 05 (value)
-32767 => 02 (fixed integer) 02 (length) 80 00 (value)
Some exact hugenum => 02 (fixed integer) 82 (meta-length of 2) 01 00 (length of 256) xx xx xx ... (256 bytes of big-endian value)

1.22 => DB (IEEE float) 08 (length) 3F F3 85 1E B8 51 EB 85 (big-endian float64)

"abcäöü" => 0C (UTF-8) 09 (length) 61 62 63 C3 A4 C3 B6 C3 BC (UTF-8 representation)

xyz => DD (symbol) 03 (length) 78 79 7A (value)

(5 1.22 "abcäöü" xyz) => E0 (list) 1D (length of 29) 02 01 05 DB 08  3F F3 85 1E B8 51 EB 85 0C 09 61 62 63 C3 A4 C3 B6 C3 Bc DD 03 78 79 7A

#(5 1.22 "abcäöü" xyz) => same, except type code is 30, which is X.690 standard as opposed to E0 which is in the set of type codes for private standards.

Note that a deserializer that doesn't understand symbols (code DD, also a private standard code) can safely skip over the 3 bytes of the symbol without losing registration with what might come next.

Do you think the simplest case is simple enough that my separate binary
S-exp format should be abandoned? I'm not quite convinced that something
of approximately equal simplicity can be crafted by suitably subsetting
ASN.1, but I do not know this.

Read this email and tell me what you think.  It's not quite as simple as your idea, but it does play nicely with other people, up to the limit of their understanding, and they always know what they do not understand (see below) and can't be confused by it, not only at the level of objects, but at the level of parts of objects.
 
Crucially, those binary S-expressions are also easy to use from C (well,
as easy as any hierarchical dynamically-typed structure can be). I like
to implement new formats in C; if the pain is tolerable there, it's
tolerable anywhere.

I think these are too, modulo the same concerns.
 
ASN.1 seems to use OIDs (dotted names). How much of a core part is that?
With hindsight, we can now say that it might be simpler to use repurpose
the internet DNS for a name hierarchy.

Not core at all.  OIDs are a way of assigning vectors of numbers to things or concepts in such a way that when lexicographically sorted they are grouped administratively, which is useful for the purposes ASN.1 is normally used for, but they are just another datatype.

(You can get your very own official OID by going to <https://oidplus.viathinksoft.com/oidplus/?goto=oid%3A1.3.6.1.4.1.37476.9000> and asking for one at no charge:  it will look like 1 (ISO) 3 (registered organization) 6 (U.S. Department of Defense) 1 (Internet/IANA) 4 (private) 1 (enterprise) 37476 (ViaThinkSoft) 9000 (free OIDs) yournumber (you).  You can then assign and register any OID that begins with 1.3.6.1.4.1.37476.9000.yournum.  The registry (which is not a registrar like ViaThinkSoft) is at <http://www.oid-info.com/>.  The maximum safe value to use as an OID component is 2^32-1, so plain vectors or s16vectors would be the Right Thing.  That said, their registration page seems to be non-working at the moment; fortunately there are other approaches.)
 
Adding custom datatypes needs to be baked into any such format. It needs
to be possible for an implementation to skip advanced datatypes that it
doesn't understand. This means that values of those types need to be
prefixed with their length somehow, or be built entirely from
length-prefixed parts.

As shown above, all types are prefixed with their length in bytes (excluding the type and length itself), and for those types with components, each component is prefixed with its length in bytes.  There are plenty of codes we can register and plenty of purely private-use codes, as noted in one of my other postings.


John Cowan          http://vrici.lojban.org/~cowan        xxxxxx@ccil.org
"Repeat this until 'update-mounts -v' shows no updates.
You may well have to log in to particular machines, hunt down
people who still have processes running, and kill them."