Re: OS procedures - Simplelists

Show/hide message thread
OS procedures Göran Weinholt (21 Apr 2020 12:35 UTC)
Re: OS procedures Lassi Kortela (21 Apr 2020 21:31 UTC)
Re: OS procedures Lassi Kortela 21 Apr 2020 21:31 UTC
> Good idea for a SRFI! I have some comments which are listed in order
> from practical to esoteric.

Great comments, thanks for the thorough critique to you as well!

> * (os-command-line) is defined to return strings. I suggest returning
>    the OS's native data as bytevectors.
>
>    Linux file systems don't require valid UTF-8 and Windows doesn't
>    require valid UTF-16 (https://simonsapin.github.io/wtf-8/). We can
>    already get the friendly strings from (command-line), but don't have
>    any way to get the bytevectors. Having the command line in bytevector
>    format is one prerequisite for being able to open filenames that are
>    not valid UTF-8/UTF-16. More APIs would need to handle bytevectors,
>    but it's a start.

I was planning to have a follow-up SRFI that returns bytevectors instead
of strings. You may be right that a string version of `os-command-line`
is not that useful if we have a bytevector one from which strings can be
easily derived.

A further problem is that Windows stores the whole command line
internally as one long string (wchar vector to be precise). Ideally a
`raw-os-command-line` procedure would return a single bytevector on
Windows, and a list (or vector) of bytevectors on Unix. I designed a
procedure for the follow-up SRFI to take this discrepancy into account,
but it's not terribly easy to use.

Perhaps there should also be a procedure to get environment variables as
bytevectors instead of strings. It could go in the same SRFI as the
command-line bytevectors.

> * The (command-line) procedure has converted the arguments to strings,
>    which is fine. But what happens to invalid bytes, are they replaced
>    with the replacement character (U+FFFD)?

The current draft does not say, but that seems like a good convention.

> * You mention that argv[0] is not reliable, but even worse is that
>    execve() will let you pass NULL in argv[0]. The consequence would be
>    (os-command-line) => (). A lot of C programs out there segfault when
>    you start them that way. Do we want to keep trapping programs into
>    writing such bugs or should argv[0]==NULL give (os-command-line) =>
>    (#vu8())? (Assuming bytevectors are used, of course).

Whoa, I had no idea NULL is allowed :) I always wondered when writing C
what would happen if argc is zero. Or is that yet another different
situation -- if argv[0] is NULL, does it count as a real argument and
argc is still >= 1?

Doesn't a zero-length bytevector introduce an ambiguity between argv[0]
being NULL and ""? If there is an ambiguity, the NULL case could be #f
and "" from C could be turned into a zero-length bytevector. Not that it
matters much. Turning both into a zero-length bytevector is fine with me.

> * I concur with Sebastien Marie and believe that (os-executable-file) is
>    probably not very usable in practice. Finding the first argument given
>    to execve(), even in the face of a brutal parent process, is not very
>    useful to the program author. Programs that need to load other files
>    from the file system tend to incorporate their paths into the binary
>    during a configuration step before compilation. If a program wants to
>    know the name of its own executable it can simply build that string
>    into the binary. If a program wants to have different personalities
>    based on how it was started (like e.g. Chez Scheme's scheme-script
>    binary) then argv[0] is all it needs, because it can assume a
>    friendly parent process. The parent process can always do something to
>    mess up the execution of the child anyway.

I think a lot of Windows programs do something like `os-executable-file`
to find out where they are. This is done when "portable" programs are
put on a USB stick and store their config files in the same directory
where the .exe file is, for example.

On Unix the practice is less common. All of the problems and the better
solutions you point out are valid. On Unix it's generally better to
hard-code the path into the executable, or take it relative to HOME or
another environment variable.

> * readlink("/proc/self/exe") on Linux is not 100% reliable. If the
>    binary is deleted then the symlink points to e.g. "/bin/bash
>    (deleted)". Programs can also be executed from a memfd and the symlink
>    then says "/memfd: (deleted)". It is also not certain that /proc is
>    mounted.

I didn't know about this behavior. Your and Sebastien's comments
definitely suggest that `os-executable-file` should just return the raw
string from the OS API.

Perhaps it should be left out of this SRFI altogether. I deliberately
specified it so implementations can always return `#f` to weasel out of
any difficult situations. But it seems the whole procedure is
questionable anyway since OSes can return weird strings.

> * The ELF auxiliary vector has the executable filename.
>
>    I checked "info auxv" in gdb on FreeBSD, NetBSD and Linux (respectively):
>    15   AT_EXECPATH          Executable path                0x7fffffffefd8 "/bin/ls"
>    2014 AT_SUN_EXECNAME      Canonicalized file name given to execve 0x7f7fffcb74e0 "/bin/ls"
>    31   AT_EXECFN            File name of executable        0x7fffffffeff0 "/bin/ls"
>
>    I think only NetBSD canonicalizes it. OpenBSD omits this useful information.

Very interesting, I had no idea gdb has ready-made tools for low-level
ELF mining but it makes sense!