String literals inside bytevector literals
Lassi Kortela 25 Sep 2019 10:45 UTC
>>> (2) makes bits of a bytevector that happen to also be valid ascii or
>>> utf-8 text "readable", but is more complicated to generate/parse and
>>> ends up as a worse form of (1) for very unprintable stuff.
>>
>> The thing is that character encoding is easily messed up by running
>> "iconv". (This problem concerns ordinary strings too.)
>
> Yes. On a subtler level, it conflates the model of the s-expression as a
> sequence of *characters* - and it being a sequence of bytes. A bit of a
> layer violation!
Exactly - you express it much better than I could :) The particular
byte-level encoding of characters in a text file is not reliable.
And with fancy encodings like UTF-8, lots of glyphs look identical to
each other while being encoded differently or are even composed of
different characters. So Scheme's full string syntax is a particularly
unreliable foundation for putting things into bytevectors where
presumably every byte-value matters.
Since ASCII is so simple and universal, we could make a concession for
trusting the byte-values of ASCII graphic characters.