Email list hosting service & mailing list manager

SQLite subprocess working Lassi Kortela (17 Sep 2019 17:50 UTC)
Buffaloed and dogpiled (was: SQLite subprocess working) John Cowan (17 Sep 2019 19:07 UTC)
Re: Buffaloed and dogpiled Lassi Kortela (17 Sep 2019 21:01 UTC)
(missing)
Re: Buffaloed and dogpiled Lassi Kortela (19 Sep 2019 09:09 UTC)
Re: Buffaloed and dogpiled John Cowan (20 Sep 2019 15:25 UTC)

Re: Buffaloed and dogpiled Lassi Kortela 19 Sep 2019 09:09 UTC

>> I think the root of our disagreement, to the extent there is such, is
>> whether or not text formats are simple. Personally I'm of the opinion
>> that there is no such thing as a simple text format
>
> There are, actually.  Microsoft .INI format is simple, for example, though
> it has grown a lot of optional cruft over the years.  MicroXML is far more
> capable, is specified in 10 pages of prose at <
> https://dvcs.w3.org/hg/microxml/raw-file/tip/spec/microxml.html>, and is
> implemented in 428 SLOC of JavaScript at <
> https://github.com/jclark/microxml-js> (along with a lot of tests and test
> drivers).

With all due respect, since you are the editor of the MicroXML
specification, those two formats are good illustrations of where we
disagree. To a diligent implementor, both are far from simple. To take
.ini for example, some concerns off the top of my head:

Is this .ini section valid? What is its name:

[section name with a bracket[]

What about backslashes:

[section name with a backslash\]

What about whitespace before or within the brackets?

    [  section   ]

Whitespace before variable name, beside equals sign:

name=    value
   name   = value

Is trailing whitespace trimmed from values or kept in?

What if the value contains an equals sign?

name==value

What if the value contains quotes?

name="value  "

What if there's text after the quotes?

name="value" and more value, or is there?

What if the name contains quotes?

"name is this" = value is that

Are comments permitted?

# is this a comment = or a name-value pair?

What about after the value?

name = value  # comment or value?

etc.

There are answers to all of the above questions, and they depend on who
you ask. But my point is that none of the above concerns should arise in
the first place if we are talking about a program sending data to
another program. There is no reason why a program should be concerned
with byte-order marks, whitespace, comments, escaping, line and token
delimiters to such a degree.

Delimiters in any format (text or binary) are suspect by default. Why
doesn't the format say up front how much data is about to come? Most of
the time, the sender knows well how much data it's sending.

.ini is simple, and MicroXML is moderately simple, if you start from the
assumption that text formats are simple to begin with. I start from bits
and bytes; from that perspective, text brings a great deal of complexity
for no reason.

For specific tasks where people need to read and write the stuff, there
can be an equivalent text format representing the same data model, as
for example with binary and text S-expressions. IPC doesn't have that
requirement, so text just brings more complexity. In the rare cases you
need to talk text to a binary-IPC program, just add a text filter to the
pipeline. The filter is optional complexity; I'd keep required
complexity to a minimum.

> But then James Clark is beyond brilliant, and even for him it
> isn't easy to design something simple.

This is the shortest proof of my point. By contrast, anyone who
understands varints can instantly design a simple binary format with
none of the above concerns that puzzle conscientious implementors of
something as pedestrian as .ini.

I want to thank you that this discussion has made me realize something
that I've never before realized in 15 years of thinking about this
stuff: a generic, repurposable binary format should have a text-based
dual with the same data model. That will satisfy scenarios where data
needs to be text-edited. The dual of text and binary S-expressions made
it a breeze to implement the IPC in the current database subprocesses.
The C side can be simple, dealing only with binary, and yet Scheme code
in text can be translated directly to IPC commands, and IPC responses
can be printed as Scheme.