Am Di., 18. Aug. 2020 um 11:53 Uhr schrieb Daphne Preston-Kendal <xxxxxx@nonceword.org>:

On 18 Aug 2020, at 02:18, Shiro Kawai <xxxxxx@gmail.com> wrote:

> I agree. With this regard, I think the syntax Alex suggested earlier seems to work well. This way, octed sequence that's not valid as utf-8 can be included without ambiguity:
>
> #u8("abcde" #x80 "efghi")
>
> In the string part, we can say either ASCII-only, or utf-8 encoded string.

My objection to this is that it looks like a vector of three items: a string,
an integer, and another string. But the same applies to John’s proposed

As an #u8 vector can only contain bytes and not strings as single elements, I do not see a problem here. (Other programming languages like C use the same convention as in the single string given by the tokens "Hello" ", " "World!\n".

I still don’t get how \x80; in a #u8"…" is ambiguous. As I note in the
acknowledgements, Python has been doing exactly this for over a decade. But

Well, Python seems to be doing exactly what I have been proposing: Python has in string literals "\xhh" for and "\uxxxx" corresponding to the C11 escape sequences with similar names. In bytes literals, Python just has "\xhh".

Now, Python's "\u..." is Scheme's "#x...;".  So we need another escape code (if we do not want to follow Alex's suggestion) for Python's "\x...".

(The way, SRFI 207 is currently written, translated to Python, it would be that only "\uxx" were allowed in Python's bytes literals (encoding the byte xx, while the same sequence in Python strings would encode some UTF-8 bytes of the character U+xx.)