Core lexical syntax Lassi Kortela (25 Sep 2019 10:15 UTC)
Re: Core lexical syntax John Cowan (25 Sep 2019 14:09 UTC)
Machines vs humans Lassi Kortela (25 Sep 2019 14:25 UTC)
Re: Core lexical syntax Alaric Snell-Pym (25 Sep 2019 15:44 UTC)
Re: Core lexical syntax John Cowan (25 Sep 2019 14:13 UTC)
Re: Core lexical syntax John Cowan (25 Sep 2019 19:18 UTC)
Mechanism vs policy Lassi Kortela (25 Sep 2019 19:58 UTC)
Re: Mechanism vs policy Arthur A. Gleckler (25 Sep 2019 21:17 UTC)
Re: Mechanism vs policy Lassi Kortela (26 Sep 2019 07:40 UTC)
Re: Mechanism vs policy John Cowan (25 Sep 2019 22:25 UTC)
Re: Mechanism vs policy Arthur A. Gleckler (26 Sep 2019 01:34 UTC)
Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (26 Sep 2019 08:23 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Alaric Snell-Pym (26 Sep 2019 08:56 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 02:38 UTC)
ASN.1 branding Lassi Kortela (27 Sep 2019 14:56 UTC)
Re: ASN.1 branding Alaric Snell-Pym (27 Sep 2019 15:24 UTC)
Re: ASN.1 branding Lassi Kortela (27 Sep 2019 18:54 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 01:57 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (27 Sep 2019 16:24 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 17:37 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (27 Sep 2019 18:28 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 18:39 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding Lassi Kortela (27 Sep 2019 18:46 UTC)
Re: Limits, symbols and bytevectors, ASN.1 branding John Cowan (27 Sep 2019 21:19 UTC)
Re: Mechanism vs policy Alaric Snell-Pym (26 Sep 2019 08:45 UTC)
Implementation limits Lassi Kortela (26 Sep 2019 08:57 UTC)
Re: Implementation limits Alaric Snell-Pym (26 Sep 2019 09:09 UTC)
Re: Implementation limits Lassi Kortela (26 Sep 2019 09:51 UTC)
Meaning of the word "format" Lassi Kortela (26 Sep 2019 10:31 UTC)
Stacking it all up Lassi Kortela (26 Sep 2019 11:05 UTC)
Brief spec-writing exercise Lassi Kortela (26 Sep 2019 11:46 UTC)
Re: Brief spec-writing exercise John Cowan (26 Sep 2019 15:45 UTC)
Standards vs specifications Lassi Kortela (26 Sep 2019 21:24 UTC)
Re: Standards vs specifications John Cowan (27 Sep 2019 04:29 UTC)
Re: Standards vs specifications Lassi Kortela (27 Sep 2019 13:47 UTC)
Re: Standards vs specifications John Cowan (27 Sep 2019 14:53 UTC)
Re: Meaning of the word "format" John Cowan (26 Sep 2019 20:59 UTC)
Re: Meaning of the word "format" Lassi Kortela (26 Sep 2019 21:09 UTC)
Re: Meaning of the word "format" John Cowan (27 Sep 2019 02:44 UTC)
Length bytes and lookahead in ASN.1 Lassi Kortela (27 Sep 2019 13:58 UTC)
Re: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 14:22 UTC)
Re: Length bytes and lookahead in ASN.1 Alaric Snell-Pym (27 Sep 2019 15:02 UTC)
Re: Length bytes and lookahead in ASN.1 hga@xxxxxx (27 Sep 2019 15:26 UTC)
(missing)
Fwd: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 16:40 UTC)
Re: Fwd: Length bytes and lookahead in ASN.1 Alaric Snell-Pym (27 Sep 2019 16:51 UTC)
Re: Fwd: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 17:18 UTC)
Length bytes and lookahead in ASN.1 hga@xxxxxx (27 Sep 2019 16:58 UTC)
Re: Length bytes and lookahead in ASN.1 John Cowan (27 Sep 2019 17:21 UTC)
Re: Mechanism vs policy John Cowan (27 Sep 2019 03:52 UTC)
Re: Core lexical syntax Alaric Snell-Pym (26 Sep 2019 08:36 UTC)

Brief spec-writing exercise Lassi Kortela 26 Sep 2019 11:46 UTC

As an exercise in spec writing, here's a binary version of the .ini
format whose pitfalls we discussed earlier:

----------------------------------------------------------------------

BINI (binary .ini) format

A varint is a little-endian integer encoded into the low 7 bits of
each byte in a sequence. All but the last byte have the high bit set.

A varstring is a varint byte count followed by that many bytes giving
the string. UTF-8 character encoding should be assumed.

A BINI file is a sequence of zero or more sections.

A section is a varstring section title, followed by a varint entry
count, followed by the entries. Each entry is a varstring name
followed by a varstring value.

----------------------------------------------------------------------

The format is so simple that it's easy to notice where we'd drive off
into the weeds by trying to be stricter:

* Varints don't specify a maximum value. It's easy to see that people
   could just violate that requirement.

* We don't specify that sections or entries have to be in order,
   because obviously the format lets people write them in a different
   order.

* We don't specify what to do about duplicate section or entry names,
   since the format clearly lets people write duplicates and interpret
   them how they like.

* The most suspect part of the above spec is the part where it says
   strings are UTF-8. This is useful, but it's only a matter of time
   before someone finds it convenient to use .BINI with some other
   character encoding.

* We don't specify anything about invalid characters and what to do
   when reading them, since clearly the format lets people write
   invalid characters and read them how they wish.

This in my view is what an honest format spec looks like. The main
feature is how little it says. Given that this says what the physical
structure permits, it would be enlightening to mull over how to give
advice about logical validation on top of it that actually works. I
suspect it's hard but some reasonable approach can be found. The best
approach is to design a format that can't encode the values you don't
want in the first place. But then people are liable to switch to
another format that can encode those values.