Re: SRFI 207: String-notated bytevectors

Show/hide message thread

SRFI 207: String-notated bytevectors Arthur A. Gleckler (15 Aug 2020 23:29 UTC)
Re: SRFI 207: String-notated bytevectors Per Bothner (16 Aug 2020 00:31 UTC)
Re: SRFI 207: String-notated bytevectors Alex Shinn (16 Aug 2020 01:16 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (16 Aug 2020 10:15 UTC)
(missing)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (16 Aug 2020 10:40 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (17 Aug 2020 03:18 UTC)
bytestring procedure Lassi Kortela (17 Aug 2020 07:56 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (17 Aug 2020 16:10 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (18 Aug 2020 00:19 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 06:51 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 07:04 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 09:53 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 10:14 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (18 Aug 2020 10:50 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 10:57 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 11:22 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (18 Aug 2020 15:49 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 16:12 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 16:38 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 17:00 UTC)
(missing)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 18:49 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (18 Aug 2020 22:30 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (19 Aug 2020 20:38 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (19 Aug 2020 20:44 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (19 Aug 2020 21:55 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (20 Aug 2020 00:54 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (20 Aug 2020 06:04 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (20 Aug 2020 06:09 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (20 Aug 2020 06:33 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 17:43 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (18 Aug 2020 17:49 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 18:31 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 16:16 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 09:48 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 10:02 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 10:27 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 10:28 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (16 Aug 2020 10:31 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (16 Aug 2020 10:10 UTC)

Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal 18 Aug 2020 16:38 UTC

On 18 Aug 2020, at 18:12, Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:

> This is bad for implementations who want to extend SRFI 207 to the full Unicode repertoire of characters.

Implementations should not do this. (Or more accurately: implementations which
do this should not do so in the expectation that it will be interoperable.)

Bytes are bytes and codepoints are codepoints. Fairly often when processing
human-readable strings in Latin script, generally short ones, they happen to
roughly mean the same thing, which is why this SRFI exists for humans reading
and writing source code. But when they don’t match up, it should be as visually
obvious as possible.

> The above may not even work when the string lexer of the implementation is used and the implementation does not allow ASCII 0 as part of strings.

"\yC0" is an invalid string that the lexer should reject if we follow this line
of argument, but #u8"\yC0" (or #u8("\yC0"), in Alex’s spelling) is a perfectly
cromulent bytevector. (This is assuming our goal is to make #u8 strings a
notational equivalent to a call to string->utf8.)

The whole thing with \y (or whatever we’d call it) in normal strings reminds me
too much of how JavaScript pre-ES6 behaves with astral plane characters. (You
had to escape the surrogate pairs!) Who is asking to be able to write regular
strings in hex-escaped UTF-8 byte sequences? And why on earth should a SRFI
about bytevector notation have to go in and change the definition of string
notation too?

Even if the eventual notation is #u8() with string literals in it, I’m
following John and putting a firm no on the idea of calling the escape sequence
something different. This also, by implication, means equivalence with
string->utf8 won’t work. I think that idea was a non-starter.

Daphne