Re: Bytevectors instead of strings in SRFI 170
Lassi Kortela 03 Aug 2020 21:32 UTC
>> Would using strings assume that the underlying character encoding of
>> the OS is UTF-8? Can we assume this in 2020? Or do we have to convert
>> from whatever local encoding to Unicode?
>>
>> Strings instead of bytevectors make some sense because the basic
>> R7RS-small procedures dealing with file names all take (Unicode)
>> strings as arguments.
I meant bytevectors as another alternative in addition to strings in
SRFI 170. Strings would be the ordinary and preferred thing to use for
most purposes, but bytevectors could be used where strings cannot
represent everything you need. This is what the Python 3 "os" library
offers.
> In both cases, it's possible to create pathnames that cannot be
> interpreted as a sequence of Unicode characters. This also means that
> Lassi's pre-SRFI must have some way of telling the caller whether names
> are 8-bit or 16-bit.
It could return u16vectors on Windows, but u16vectors can also be
represented as bytevectors, which is good for uniformity across
platforms. In order to translate either kind of vector into a string,
you need to specify an encoding anyway. Every programming language has
ready APIs to turn bytevectors into strings (using UTF-16LE or UCS-2 for
example) but not necessarily u16vectors. Bytevectors also have the
advantage over u16vectors that they are standard since R6RS.