Provide two versions of os-executable-file Sebastien Marie (20 Apr 2020 11:10 UTC)
Re: Provide two versions of os-executable-file Lassi Kortela (21 Apr 2020 21:11 UTC)
Re: Provide two versions of os-executable-file John Cowan (23 Apr 2020 02:15 UTC)

Re: Provide two versions of os-executable-file Lassi Kortela 21 Apr 2020 21:11 UTC

Thank you for the detailed comments (and welcome to the SRFI lists in
case you are new here :)

> First, I think the srfi covers relatively well the problem with os-executable-file
> about "how returning absolute pathname pointing to the executable file running
> the Scheme program" on differents OS.
>
> But I would add some elements.
>
> First, why a programmer would want to use such function, and which would not be
> possible with argv[0] ?

argv[0] is often just the basename of the command, which means you'd
have to replicate a PATH search. That's complex and unreliable.

> Usually it is because it needs accuracy and somehow secure information (to not
> rely on user provided argv[0]).

The OS-dependent executable filename can be expected to be more
consistently accurate than the basename (since the basename can be in
many formats, whereas the executable filename is in one format per OS.

I wouldn't consider the executable filename safe in security-sensitive
contexts (as Göran detailed in his mail) - that was never the goal. The
SRFI should emphasize that more clearly; I'll edit the prose.

> From my experience, it could be to simply re-execute the program (a way to
> "reload", with eventually new parameters, for example), or to execute a subpart
> of the program as separated system processus (an alternative to simply calling
> fork(2) to take advantage of freshness provided by OS - see fork+exec inside
> https://www.openbsd.org/innovations.html for details)
>
> or to open the file to read it, for example to parse ELF and get debug symbols
> name to provide fancy backtraces.

Yes, that's one possible use for it.

> In both cases, the accuracy of the result is very important, else it could
> introduce subtile security issues: if at time of use, the pathname points to
> something else (file removed and replaced by different program) the program will
> execute unexpected code, or it will read and try to parse unexpected data.

You're exactly right. However, this security problem exists no matter
which filename you execute: even /bin/ls can be replaced by a malicious
program if an attacker has root access.

fexecve() is probably a little safer.
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html>
SRFI 193 doesn't have a provision to open a file handle to the running
executable, and I'm not sure how portable such a feature would be.

Perhaps a future SRFI should collect Unix security features (fexecve(),
issetugid(), pledge(), etc.) SRFI 170 is still open, but the security
APIs are less fundamental than most stuff in it and security is more of
a moving target.

> The SRFI-169 shows OS dependent methods to retreive the pathname information,
> and I think it is the responsability to the OS to provide accurate information
> (it is why OpenBSD doesn't provide it).
>
> But I would add that (os-executable-file) should provide such information
> *as-it*, and only provide an uniform way to access to the information.
> Particulary, it should not try to resolve the path provided by the system. Else
> it will introduce a inherent TOCTOU.

Very good point - thank you for making it :)

I'll tone down the prose so that the OS APIs are not advertised as
"reliable" like they are in the current draft.

The point of resolving the pathname to an absolute one is that the
program can chdir() later, which would make the relative path wrong. But
your approach may be better.

> In order to illustrate my concern, I will take the Linux example. The kernel
> provides a pathname "/proc/self/exe" which points to the right file at anytime
> (the kernel itself deals with file removing or renaming, if I recall correctly).
>
> If (os-executable-file) returns the result of readlink("/proc/self/exe"), at
> soon the function returns, the result could be already wrong. Someone could
> remove the file and replace it with something else, in the time between
> (os-executable-file) returns and the effective use of the function result. Only
> the kernel itself could provide the required atomicity.
>
> But now, if (os-executable-file) returns only "/proc/self/exe", the information
> is valid at anytime for any use. The program could use it to re-execute (I
> assume, I don't have checked in depth) or to read the file. The kernel itself
> will ensure the operation (execve(2) or open(2)) to operate on the right file.
>
> So, as long the function returns a path which isn't generic, it is already
> flawed for any real usage. It is why I think (os-executable-file) shouldn't
> return a resolved path.

Aha, you're thinking of returning a path to a file/symlink from which
the real path can be read. That's a different concern still.

The trouble is that there are many non-procfs based approaches (sysctl
and custom C APIs as listed in the draft). I'm not sure what those APIs
do in case the executable is moved from the old path (i.e. whether they
update the internal informations to point to the new path). I'd guess
they don't all update it.

> For this reason I would introduce two differents functions:
> - one to retreive "accurate" pathname (generic one) or #f
>
> - another to retreive a "supposed right" pathname, which could be always
>    implemented (even on OpenBSD) by duplicating the actions of the shell in
>    searching for the executable file (using PATH environment, or confstr(_CS_PATH)
>    [posix function to retreive default PATH] and iterating on the directories to
>    find the program name or argv[0])

IMHO the path-search-for-self sounds a bit hacky to have in a SRFI.
However, a path-search-for-arbitrary-command procedure would probably be
useful for many things, and people could trivially combine it with (car
(os-command-line)) to find self.

There's a plan to write a process spawning SRFI as a continuation of
SRFI 170. (Subprocesses were left out of 170 since 170 was already so
big and we wouldn't have time to do justice to the many details.)
Perhaps the process SRFI should include a path search procedure.

A procedure that promises to get an accurate pathname is a bit
problematic as well - we should find out which OSes give an accurate
pathname and Scheme implementations should hardcode that knowledge. But
what if the internals of those operating systems are later changed so
that the information they return is no longer as accurate?

In light of these thoughts, it would probably be best to have a "here is
the raw executable filename from the OS - take it or leave it"
procedure. The `os-executable-file` in the current draft is like that,
but you are right that it shouldn't be advertised as reliable.

>  From the SRFI-169 list of OS, only few OS will be able to provide the "accurate"
> version of (os-executable-file). But all will be able to provide the
> "supposed-right" version.
>
> And by providing two versions, it makes the developer aware about the fact that
> (os-executable-file/assumed-right) could return a pathname to something else
> that the executable itself, and so should not be used without care.
>
> Thanks.

Wonderful comments all - thank you for making them.