OS procedures Göran Weinholt (21 Apr 2020 12:35 UTC)
Re: OS procedures Lassi Kortela (21 Apr 2020 21:31 UTC)

OS procedures Göran Weinholt 21 Apr 2020 12:31 UTC

Hello Lassi,

Good idea for a SRFI! I have some comments which are listed in order
from practical to esoteric.

* (os-command-line) is defined to return strings. I suggest returning
  the OS's native data as bytevectors.

  Linux file systems don't require valid UTF-8 and Windows doesn't
  require valid UTF-16 (https://simonsapin.github.io/wtf-8/). We can
  already get the friendly strings from (command-line), but don't have
  any way to get the bytevectors. Having the command line in bytevector
  format is one prerequisite for being able to open filenames that are
  not valid UTF-8/UTF-16. More APIs would need to handle bytevectors,
  but it's a start.

* The (command-line) procedure has converted the arguments to strings,
  which is fine. But what happens to invalid bytes, are they replaced
  with the replacement character (U+FFFD)?

* You mention that argv[0] is not reliable, but even worse is that
  execve() will let you pass NULL in argv[0]. The consequence would be
  (os-command-line) => (). A lot of C programs out there segfault when
  you start them that way. Do we want to keep trapping programs into
  writing such bugs or should argv[0]==NULL give (os-command-line) =>
  (#vu8())? (Assuming bytevectors are used, of course).

* I concur with Sebastien Marie and believe that (os-executable-file) is
  probably not very usable in practice. Finding the first argument given
  to execve(), even in the face of a brutal parent process, is not very
  useful to the program author. Programs that need to load other files
  from the file system tend to incorporate their paths into the binary
  during a configuration step before compilation. If a program wants to
  know the name of its own executable it can simply build that string
  into the binary. If a program wants to have different personalities
  based on how it was started (like e.g. Chez Scheme's scheme-script
  binary) then argv[0] is all it needs, because it can assume a
  friendly parent process. The parent process can always do something to
  mess up the execution of the child anyway.

* readlink("/proc/self/exe") on Linux is not 100% reliable. If the
  binary is deleted then the symlink points to e.g. "/bin/bash
  (deleted)". Programs can also be executed from a memfd and the symlink
  then says "/memfd: (deleted)". It is also not certain that /proc is
  mounted.

* The ELF auxiliary vector has the executable filename.

  I checked "info auxv" in gdb on FreeBSD, NetBSD and Linux (respectively):
  15   AT_EXECPATH          Executable path                0x7fffffffefd8 "/bin/ls"
  2014 AT_SUN_EXECNAME      Canonicalized file name given to execve 0x7f7fffcb74e0 "/bin/ls"
  31   AT_EXECFN            File name of executable        0x7fffffffeff0 "/bin/ls"

  I think only NetBSD canonicalizes it. OpenBSD omits this useful information.

Regards,

--
Göran Weinholt   | https://weinholt.se/
Debian developer | 73 de SA6CJK