String literals inside bytevector literals

Show/hide message thread

Attempt at a stack of data formats to make everyone happy Lassi Kortela (19 Sep 2019 17:28 UTC)

Sketching the format stack Lassi Kortela (19 Sep 2019 18:07 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela (19 Sep 2019 19:43 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela (19 Sep 2019 19:44 UTC)

Re: Attempt at a stack of data formats to make everyone happy John Cowan (19 Sep 2019 20:19 UTC)

Re: Attempt at a stack of data formats to make everyone happy John Cowan (20 Sep 2019 20:59 UTC)

Re: Attempt at a stack of data formats to make everyone happy Arthur A. Gleckler (20 Sep 2019 22:19 UTC)

Re: Attempt at a stack of data formats to make everyone happy Alaric Snell-Pym (24 Sep 2019 09:02 UTC)

Re: Attempt at a stack of data formats to make everyone happy Lassi Kortela (24 Sep 2019 09:29 UTC)

Core S-expression and binary formats John Cowan (24 Sep 2019 14:49 UTC)

Re: Core S-expression and binary formats John Cowan (25 Sep 2019 02:14 UTC)

Sharpsign syntax for hashtables, sets, bytevectors, etc. Lassi Kortela (25 Sep 2019 08:26 UTC)

Bytevector literals Lassi Kortela (25 Sep 2019 08:38 UTC)

Re: Sharpsign syntax for hashtables, sets, bytevectors, etc. Alaric Snell-Pym (25 Sep 2019 09:33 UTC)

Re: Sharpsign syntax for hashtables, sets, bytevectors, etc. Lassi Kortela (25 Sep 2019 09:53 UTC)

Re: Sharpsign syntax for hashtables, sets, bytevectors, etc. Alaric Snell-Pym (25 Sep 2019 10:32 UTC)

String literals inside bytevector literals Lassi Kortela (25 Sep 2019 10:46 UTC)

A S-expression syntax that can carry all this stuff Lassi Kortela (19 Sep 2019 20:01 UTC)

String literals inside bytevector literals Lassi Kortela 25 Sep 2019 10:45 UTC

>>> (2) makes bits of a bytevector that happen to also be valid ascii or
>>> utf-8 text "readable", but is more complicated to generate/parse and
>>> ends up as a worse form of (1) for very unprintable stuff.
>>
>> The thing is that character encoding is easily messed up by running
>> "iconv". (This problem concerns ordinary strings too.)
>
> Yes. On a subtler level, it conflates the model of the s-expression as a
> sequence of *characters* - and it being a sequence of bytes. A bit of a
> layer violation!

Exactly - you express it much better than I could :) The particular
byte-level encoding of characters in a text file is not reliable.

And with fancy encodings like UTF-8, lots of glyphs look identical to
each other while being encoded differently or are even composed of
different characters. So Scheme's full string syntax is a particularly
unreliable foundation for putting things into bytevectors where
presumably every byte-value matters.

Since ASCII is so simple and universal, we could make a concession for
trusting the byte-values of ASCII graphic characters.