Am Fr., 4. Dez. 2020 um 23:11 Uhr schrieb Arthur A. Gleckler <xxxxxx@speechcode.com>:
On Fri, Dec 4, 2020 at 1:30 PM Marc Nieper-Wißkirchen <xxxxxx@nieper-wisskirchen.de> wrote:
Non-ASCII is probably common because directory paths and file names are typical values of environment variables and usually not chosen inside the ASCII range outside English-speaking countries.

Environment variables that contain byte sequences not decodable in the current locale are very rare, I would say, because such a thing would usually hint at some wrong locale. It could happen when old media in some legacy encoding are mounted in some modern Unicode system.

That's what I expected, but I wasn't sure.  Thanks. 

But even if this is rare, it doesn't preclude a malicious person from trying to purposefully crash a Scheme program using this technique.

But this seems unavoidable.  After all, if the API returns byte vectors, the program is almost certainly immediately going to try to interpret it as a string, anyway.  The program is going to have to account for this possible failure mode either way, isn't it?

It depends. When the Scheme implementation allows filenames specified as bytevectors all the processing can (and should) happen at the level of bytes.  A conversion to a string would be necessary for displaying the filenames to the user. Unfortunately, R7RS's utf8->string suffers from the same problem as `get-environment-variable`: Anything may happen in case the bytevector cannot be decoded. R6RS specifies that a replacement character should be inserted instead.

I think the basic problem is that there are some situations where R7RS "it is an error" where the situation is not controllable by the programmer. The philosophy should be that errors are signaled (or some other definite (!) action taken) in case interactions with the outside world go wrong, while "it is an error" should be reserved for programming errors. But using `get-environment-variable` is hardly a programming error. And applying `utf8->string` to some user-supplied bytevector should also not count as a programming error.