I found this one: https://www.python.org/dev/peps/pep-0383/

Would this be a viable option for Scheme as well? It would mean that the concept of string would have to be slightly extended, namely to strings that can contain bytes (ranging from 128 to 255) as well as characters. Such strings can be encoded in UTF-8, but they may not be encodable in every other encoding scheme, say, UTF-16.

Am Sa., 5. Dez. 2020 um 21:40 Uhr schrieb Shiro Kawai <xxxxxx@gmail.com>:

On Sat, Dec 5, 2020 at 1:51 AM Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:

Moreover, one would want a "trivial encoding" as well, which is, at least, able to represent all pathnames used by the underlying platform as a string.

When a zip file created on Windows is extracted on Linux, I sometimes get filenames that can't be interpreted as the Gauche's native encoding. Converting it to the native encoding automatically doesn't cut it, for the information is lost (e.g. if I'm doing directory copy, I have to reproduce the exact same filename in the destination, not a transcoded one).

At this moment I don't have a good solution for this; Gauche has a so-called "incomplete strings" that can contain byte sequences that's not valid in the native encoding, and directory-files returns such strings for such filenames, but that's a half-baked solution (some string operations can't be applied on incomplete strings). I plan to embed invalid octets in a string by special markers, as suggested in this mailing list. If we call it a "trivial encoding", yes, I think we would need it after all. But specifying such strings will be a pretty hairy issue.