Provide two versions of os-executable-file Sebastien Marie (20 Apr 2020 11:10 UTC)
Re: Provide two versions of os-executable-file Lassi Kortela (21 Apr 2020 21:11 UTC)
Re: Provide two versions of os-executable-file John Cowan (23 Apr 2020 02:15 UTC)

Provide two versions of os-executable-file Sebastien Marie 20 Apr 2020 11:10 UTC

Hi,

First, I think the srfi covers relatively well the problem with os-executable-file
about "how returning absolute pathname pointing to the executable file running
the Scheme program" on differents OS.

But I would add some elements.

First, why a programmer would want to use such function, and which would not be
possible with argv[0] ?

Usually it is because it needs accuracy and somehow secure information (to not
rely on user provided argv[0]).

From my experience, it could be to simply re-execute the program (a way to
"reload", with eventually new parameters, for example), or to execute a subpart
of the program as separated system processus (an alternative to simply calling
fork(2) to take advantage of freshness provided by OS - see fork+exec inside
https://www.openbsd.org/innovations.html for details)

or to open the file to read it, for example to parse ELF and get debug symbols
name to provide fancy backtraces.

In both cases, the accuracy of the result is very important, else it could
introduce subtile security issues: if at time of use, the pathname points to
something else (file removed and replaced by different program) the program will
execute unexpected code, or it will read and try to parse unexpected data.

The SRFI-169 shows OS dependent methods to retreive the pathname information,
and I think it is the responsability to the OS to provide accurate information
(it is why OpenBSD doesn't provide it).

But I would add that (os-executable-file) should provide such information
*as-it*, and only provide an uniform way to access to the information.
Particulary, it should not try to resolve the path provided by the system. Else
it will introduce a inherent TOCTOU.

In order to illustrate my concern, I will take the Linux example. The kernel
provides a pathname "/proc/self/exe" which points to the right file at anytime
(the kernel itself deals with file removing or renaming, if I recall correctly).

If (os-executable-file) returns the result of readlink("/proc/self/exe"), at
soon the function returns, the result could be already wrong. Someone could
remove the file and replace it with something else, in the time between
(os-executable-file) returns and the effective use of the function result. Only
the kernel itself could provide the required atomicity.

But now, if (os-executable-file) returns only "/proc/self/exe", the information
is valid at anytime for any use. The program could use it to re-execute (I
assume, I don't have checked in depth) or to read the file. The kernel itself
will ensure the operation (execve(2) or open(2)) to operate on the right file.

So, as long the function returns a path which isn't generic, it is already
flawed for any real usage. It is why I think (os-executable-file) shouldn't
return a resolved path.

For this reason I would introduce two differents functions:
- one to retreive "accurate" pathname (generic one) or #f

- another to retreive a "supposed right" pathname, which could be always
  implemented (even on OpenBSD) by duplicating the actions of the shell in
  searching for the executable file (using PATH environment, or confstr(_CS_PATH)
  [posix function to retreive default PATH] and iterating on the directories to
  find the program name or argv[0])

From the SRFI-169 list of OS, only few OS will be able to provide the "accurate"
version of (os-executable-file). But all will be able to provide the
"supposed-right" version.

And by providing two versions, it makes the developer aware about the fact that
(os-executable-file/assumed-right) could return a pathname to something else
that the executable itself, and so should not be used without care.

Thanks.
--
Sebastien Marie