Re: SRFI 207: String-notated bytevectors

Show/hide message thread

SRFI 207: String-notated bytevectors Arthur A. Gleckler (15 Aug 2020 23:29 UTC)
Re: SRFI 207: String-notated bytevectors Per Bothner (16 Aug 2020 00:31 UTC)
Re: SRFI 207: String-notated bytevectors Alex Shinn (16 Aug 2020 01:16 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (16 Aug 2020 10:15 UTC)
(missing)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (16 Aug 2020 10:40 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (17 Aug 2020 03:18 UTC)
bytestring procedure Lassi Kortela (17 Aug 2020 07:56 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (17 Aug 2020 16:10 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (18 Aug 2020 00:19 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 06:51 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 07:04 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 09:53 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 10:14 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (18 Aug 2020 10:50 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 10:57 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 11:22 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (18 Aug 2020 15:49 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 16:12 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 16:38 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 17:00 UTC)
(missing)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 18:49 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (18 Aug 2020 22:30 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (19 Aug 2020 20:38 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (19 Aug 2020 20:44 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (19 Aug 2020 21:55 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (20 Aug 2020 00:54 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (20 Aug 2020 06:04 UTC)
Re: SRFI 207: String-notated bytevectors Shiro Kawai (20 Aug 2020 06:09 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (20 Aug 2020 06:33 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 17:43 UTC)
Re: SRFI 207: String-notated bytevectors John Cowan (18 Aug 2020 17:49 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 18:31 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 16:16 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (18 Aug 2020 09:48 UTC)
Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen (18 Aug 2020 10:02 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 10:27 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (18 Aug 2020 10:28 UTC)
Re: SRFI 207: String-notated bytevectors Daphne Preston-Kendal (16 Aug 2020 10:31 UTC)
Re: SRFI 207: String-notated bytevectors Lassi Kortela (16 Aug 2020 10:10 UTC)

Re: SRFI 207: String-notated bytevectors Marc Nieper-Wißkirchen 17 Aug 2020 16:10 UTC

Am So., 16. Aug. 2020 um 12:40 Uhr schrieb Lassi Kortela <xxxxxx@lassi.io>:
>
> > My thoughts exactly. Allowing any non-ASCII character is liable to cause confusion at best and actively mislead at worst as to what bytes are actually in the bytevector, even if a standardized encoding were chosen.
>
> Strongly agreed.
>
> Even if we stay in the Latin-* (i.e. ISO-8859-*) range, different
> Unicode normalization forms will encode some of those characters
> differently. One would have to inspect the byte-level encoding of the
> Scheme source file from which a particular literal is read in order to
> figure out the bytes. A source file can also be re-encoded from say
> Latin-1 to UTF-8 which would again change the byte-level encoding while
> the display in a text editor keeps looking identical.

I think the main reason is that if we allowed the full Unicode range
and specified UTF-8 encoding, sequences like \x80; would be ambiguous.
Either the byte 80 is meant or the bytes corresponding to the UTF-8
encoding of U+0080.

In order to allow implementations to extend #u8"..." so that "..." can
be any string allowed by the implementation (*), I want to suggest to
rename the sequence "\xHH;" of this SRFI into something different like
"\yHH;". I think this is a good thing because "\xHH;" in strings and
characters really means something different for bytes greater than
#x7F as soon as we encode it in UTF-8 (which string->utf8 does).

--

(*) In other words, we should at least allow implementations to extend
SRFI 207 so that if "..." is any string literal, u8"..." is equal to
(string->utf8 "...").