Re: SRFI 207: String-notated bytevectors

Show/hide message thread

SRFI 207: String-notated bytevectors Arthur A. Gleckler (15 Aug 2020 23:29 UTC)
Re: SRFI 207: String-notated bytevectors Per Bothner (16 Aug 2020 00:31 UTC)
Re: SRFI 207: String-notated bytevectors Alex Shinn (16 Aug 2020 01:16 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (16 Aug 2020 10:15 UTC)
(missing)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (16 Aug 2020 10:40 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (17 Aug 2020 03:18 UTC)
bytestring procedure Lassi Kortela (17 Aug 2020 07:56 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (17 Aug 2020 16:10 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (18 Aug 2020 00:19 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 06:51 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 07:04 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 09:53 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 10:14 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (18 Aug 2020 10:50 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 10:57 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 11:22 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (18 Aug 2020 15:49 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 16:12 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 16:38 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 17:00 UTC)
(missing)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 18:49 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (18 Aug 2020 22:30 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (19 Aug 2020 20:38 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (19 Aug 2020 20:44 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (19 Aug 2020 21:55 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (20 Aug 2020 00:54 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (20 Aug 2020 06:04 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (20 Aug 2020 06:09 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (20 Aug 2020 06:33 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 17:43 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (18 Aug 2020 17:49 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 18:31 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 16:16 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 09:48 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 10:02 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 10:27 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 10:28 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (16 Aug 2020 10:31 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (16 Aug 2020 10:10 UTC)

Re: SRFI 207: String-notated bytevectors Lassi Kortela 18 Aug 2020 06:51 UTC

>     I think the main reason is that if we allowed the full Unicode range
>     and specified UTF-8 encoding, sequences like \x80; would be ambiguous.
>     Either the byte 80 is meant or the bytes corresponding to the UTF-8
>     encoding of U+0080.
>
> I agree.  With this regard, I think the syntax Alex suggested earlier
> seems to work well.  This way, octed sequence that's not valid as utf-8
> can be included without ambiguity:
>
> #u8("abcde" #x80 "efghi")
>
> In the string part, we can say either ASCII-only, or utf-8 encoded string.

That syntax also had the advantage that the R7RS #u8(...) vs R6RS
#vu8(...) discrepancy is dodged. Implementations would allow strings
inside whichever # prefix they currently have. In case they support both
prefixes, allow it in both.

> In order to allow implementations to extend #u8"..." so that "..." can
> be any string allowed by the implementation (*), I want to suggest to
> rename the sequence "\xHH;" of this SRFI into something different like
> "\yHH;". I think this is a good thing because "\xHH;" in strings and
> characters really means something different for bytes greater than
> #x7F as soon as we encode it in UTF-8 (which string->utf8 does).

I'd vote to support fewer kinds of characters in bytestring literals to
avoid confusion. Here the #u8("foo") syntax would have the advantage
that we'd have a good excuse to drop \x; escapes completely :) The
bytevector #u8("Hello" #x20 "world" #x0a) would be equivalent to
u8"Hello\x20;world\x0a;" in the current draft without using any escapes.

If \xHH; is not supported, confusion about its meaning and permitted
range of characters is also avoided.