Bytevectors instead of strings in SRFI 170
Lassi Kortela
(02 Aug 2020 19:30 UTC)
|
Re: Bytevectors instead of strings in SRFI 170
Marc Nieper-Wißkirchen
(03 Aug 2020 20:17 UTC)
|
Re: Bytevectors instead of strings in SRFI 170
John Cowan
(03 Aug 2020 21:23 UTC)
|
Re: Bytevectors instead of strings in SRFI 170
Marc Nieper-Wißkirchen
(03 Aug 2020 21:28 UTC)
|
Re: Bytevectors instead of strings in SRFI 170 Lassi Kortela (03 Aug 2020 21:32 UTC)
|
Re: Bytevectors instead of strings in SRFI 170
Marc Nieper-Wißkirchen
(03 Aug 2020 21:34 UTC)
|
Re: Bytevectors instead of strings in SRFI 170 Lassi Kortela 03 Aug 2020 21:32 UTC
>> Would using strings assume that the underlying character encoding of >> the OS is UTF-8? Can we assume this in 2020? Or do we have to convert >> from whatever local encoding to Unicode? >> >> Strings instead of bytevectors make some sense because the basic >> R7RS-small procedures dealing with file names all take (Unicode) >> strings as arguments. I meant bytevectors as another alternative in addition to strings in SRFI 170. Strings would be the ordinary and preferred thing to use for most purposes, but bytevectors could be used where strings cannot represent everything you need. This is what the Python 3 "os" library offers. > In both cases, it's possible to create pathnames that cannot be > interpreted as a sequence of Unicode characters. This also means that > Lassi's pre-SRFI must have some way of telling the caller whether names > are 8-bit or 16-bit. It could return u16vectors on Windows, but u16vectors can also be represented as bytevectors, which is good for uniformity across platforms. In order to translate either kind of vector into a string, you need to specify an encoding anyway. Every programming language has ready APIs to turn bytevectors into strings (using UTF-16LE or UCS-2 for example) but not necessarily u16vectors. Bytevectors also have the advantage over u16vectors that they are standard since R6RS.