Re: Drop environment variable setters out of SRFI 170? hga@xxxxxx 31 Jul 2020 11:57 UTC

> From: Lassi Kortela <xxxxxx@lassi.io>
> Date: Friday, July 31, 2020 5:40 AM
>
> Section `3.11 Environment variables` is mainly my fault. As discussed
> earlier on this SRFI's mailing list, setting environment variables in
> Scheme can have complicated consequences. Hence, I would prefer to
> redact the procedures I added earlier (set-environment-variable! and
> delete-environment-variable!).
>
> get-environment-variables and get-environment-variable are definitely
> useful but they're copied from SRFI 98. Not sure if there's a particular
> benefit to having them in SRFI 170 as well. I'm fine with removing them
> as well.

I find them to be really useful in testing SRFI 170....  They're POSIX
calls, see the apparently less controversial unsetenv:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/unsetenv.html

And for setenv that discusses these issues in its RATIONALE:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/setenv.html

> Unanticipated results may occur if setenv() changes the external
> variable environ. In particular, if the optional envp argument to
> main() is present, it is not changed, and thus may point to an
> obsolete copy of the environment (as may any other copy of
> environ). However, other than the aforementioned restriction, the
> standard developers intended that the traditional method of walking
> through the environment by way of the environ pointer must be
> supported.
>
> It was decided that setenv() should be required by this version
> because it addresses a piece of missing functionality, and does not
> impose a significant burden on the implementor.
>
> There was considerable debate as to whether the System V putenv()
> function or the BSD setenv() function should be required as a
> mandatory function. The setenv() function was chosen because it
> permitted the implementation of the unsetenv() function to delete
> environmental variables, without specifying an additional
> interface. The putenv() function is available as part of the XSI
> option.
>
> The standard developers considered requiring that setenv() indicate
> an error when a call to it would result in exceeding {ARG_MAX}. The
> requirement was rejected since the condition might be temporary,
> with the application eventually reducing the environment size. The
> ultimate success or failure depends on the size at the time of a
> call to exec, which returns an indication of this error condition.
>
> See also the RATIONALE section in getenv.

Here's the latter:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html

> The clearenv() function was considered but rejected. The putenv()
> function has now been included for alignment with the Single UNIX
> Specification.
>
> The getenv() function is inherently not thread-safe because it
> returns a value pointing to static data.
>
> Conforming applications are required not to directly modify the
> pointers to which environ points, but to use only the setenv(),
> unsetenv(), and putenv() functions, or assignment to environ itself,
> to manipulate the process environment. This constraint allows the
> implementation to properly manage the memory it allocates. This
> enables the implementation to free any space it has allocated to
> strings (and perhaps the pointers to them) stored in environ when
> unsetenv() is called. A C runtime start-up procedure (that which
> invokes main() and perhaps initializes environ) can also initialize
> a flag indicating that none of the environment has yet been copied
> to allocated storage, or that the separate table has not yet been
> initialized. If the application switches to a complete new
> environment by assigning a new value to environ, this can be
> detected by getenv(), setenv(), unsetenv(), or putenv() and the
> implementation can at that point reinitialize based on the new
> environment. (This may include copying the environment strings into
> a new array and assigning environ to point to it.)
>
> In fact, for higher performance of getenv(), implementations that do
> not provide putenv() could also maintain a separate copy of the
> environment in a data structure that could be searched much more
> quickly (such as an indexed hash table, or a binary tree), and
> update both it and the linear list at environ when setenv() or
> unsetenv() is invoked. On implementations that do provide putenv(),
> such a copy might still be worthwhile but would need to allow for
> the fact that applications can directly modify the content of
> environment strings added with putenv(). For example, if an
> environment string found by searching the copy is one that was added
> using putenv(), the implementation would need to check that the
> string in environ still has the same name (and value, if the copy
> includes values), and whenever searching the copy produces no match
> the implementation would then need to search each environment string
> in environ that was added using putenv() in case any of them have
> changed their names and now match. Thus, each use of putenv() to add
> to the environment would reduce the speed advantage of having the
> copy.
>
> Performance of getenv() can be important for applications which have
> large numbers of environment variables. Typically, applications like
> this use the environment as a resource database of user-configurable
> parameters. The fact that these variables are in the user's shell
> environment usually means that any other program that uses
> environment variables (such as ls, which attempts to use COLUMNS),
> or really almost any utility (LANG, LC_ALL, and so on) is similarly
> slowed down by the linear search through the variables.
>
> An implementation that maintains separate data structures, or even
> one that manages the memory it consumes, is not currently required
> as it was thought it would reduce consensus among implementors who
> do not want to change their historical implementations.

Are they really problematic in practice, on Linux, the BSDs, etc.?

- Harold