OS procedures
Göran Weinholt
(21 Apr 2020 12:35 UTC)
|
Re: OS procedures Lassi Kortela (21 Apr 2020 21:31 UTC)
|
> Good idea for a SRFI! I have some comments which are listed in order > from practical to esoteric. Great comments, thanks for the thorough critique to you as well! > * (os-command-line) is defined to return strings. I suggest returning > the OS's native data as bytevectors. > > Linux file systems don't require valid UTF-8 and Windows doesn't > require valid UTF-16 (https://simonsapin.github.io/wtf-8/). We can > already get the friendly strings from (command-line), but don't have > any way to get the bytevectors. Having the command line in bytevector > format is one prerequisite for being able to open filenames that are > not valid UTF-8/UTF-16. More APIs would need to handle bytevectors, > but it's a start. I was planning to have a follow-up SRFI that returns bytevectors instead of strings. You may be right that a string version of `os-command-line` is not that useful if we have a bytevector one from which strings can be easily derived. A further problem is that Windows stores the whole command line internally as one long string (wchar vector to be precise). Ideally a `raw-os-command-line` procedure would return a single bytevector on Windows, and a list (or vector) of bytevectors on Unix. I designed a procedure for the follow-up SRFI to take this discrepancy into account, but it's not terribly easy to use. Perhaps there should also be a procedure to get environment variables as bytevectors instead of strings. It could go in the same SRFI as the command-line bytevectors. > * The (command-line) procedure has converted the arguments to strings, > which is fine. But what happens to invalid bytes, are they replaced > with the replacement character (U+FFFD)? The current draft does not say, but that seems like a good convention. > * You mention that argv[0] is not reliable, but even worse is that > execve() will let you pass NULL in argv[0]. The consequence would be > (os-command-line) => (). A lot of C programs out there segfault when > you start them that way. Do we want to keep trapping programs into > writing such bugs or should argv[0]==NULL give (os-command-line) => > (#vu8())? (Assuming bytevectors are used, of course). Whoa, I had no idea NULL is allowed :) I always wondered when writing C what would happen if argc is zero. Or is that yet another different situation -- if argv[0] is NULL, does it count as a real argument and argc is still >= 1? Doesn't a zero-length bytevector introduce an ambiguity between argv[0] being NULL and ""? If there is an ambiguity, the NULL case could be #f and "" from C could be turned into a zero-length bytevector. Not that it matters much. Turning both into a zero-length bytevector is fine with me. > * I concur with Sebastien Marie and believe that (os-executable-file) is > probably not very usable in practice. Finding the first argument given > to execve(), even in the face of a brutal parent process, is not very > useful to the program author. Programs that need to load other files > from the file system tend to incorporate their paths into the binary > during a configuration step before compilation. If a program wants to > know the name of its own executable it can simply build that string > into the binary. If a program wants to have different personalities > based on how it was started (like e.g. Chez Scheme's scheme-script > binary) then argv[0] is all it needs, because it can assume a > friendly parent process. The parent process can always do something to > mess up the execution of the child anyway. I think a lot of Windows programs do something like `os-executable-file` to find out where they are. This is done when "portable" programs are put on a USB stick and store their config files in the same directory where the .exe file is, for example. On Unix the practice is less common. All of the problems and the better solutions you point out are valid. On Unix it's generally better to hard-code the path into the executable, or take it relative to HOME or another environment variable. > * readlink("/proc/self/exe") on Linux is not 100% reliable. If the > binary is deleted then the symlink points to e.g. "/bin/bash > (deleted)". Programs can also be executed from a memfd and the symlink > then says "/memfd: (deleted)". It is also not certain that /proc is > mounted. I didn't know about this behavior. Your and Sebastien's comments definitely suggest that `os-executable-file` should just return the raw string from the OS API. Perhaps it should be left out of this SRFI altogether. I deliberately specified it so implementations can always return `#f` to weasel out of any difficult situations. But it seems the whole procedure is questionable anyway since OSes can return weird strings. > * The ELF auxiliary vector has the executable filename. > > I checked "info auxv" in gdb on FreeBSD, NetBSD and Linux (respectively): > 15 AT_EXECPATH Executable path 0x7fffffffefd8 "/bin/ls" > 2014 AT_SUN_EXECNAME Canonicalized file name given to execve 0x7f7fffcb74e0 "/bin/ls" > 31 AT_EXECFN File name of executable 0x7fffffffeff0 "/bin/ls" > > I think only NetBSD canonicalizes it. OpenBSD omits this useful information. Very interesting, I had no idea gdb has ready-made tools for low-level ELF mining but it makes sense!