Names, c-memory-model, etc. Devon Schudy (18 Jul 2013 03:27 UTC)
Re: Names, c-memory-model, etc. John Cowan (18 Jul 2013 16:25 UTC)

Names, c-memory-model, etc. Devon Schudy 18 Jul 2013 03:19 UTC

Is it too late for comments?

The meaning of system-instance is not obvious from its name. I suspect
this is because "system" suggests "operating system", not "host", and
"instance" doesn't indicate that it's the *name* of the host. How
about something containing "name", such as machine-name or
machine-hostname or host-name or hostname?

I'm guessing os-type is supposed to return a human-readable name of
the OS ("Mac OS X", "Linux", "Microsoft Windows Vista", "Plan 9 from
Bell Labs") rather than a short machine-readable name of the OS family
("unix", "windows"), since the latter is covered better by (features).
In that case, it would be clearer to call it os-name. This would also
make os-{name,version} consistent with implementation-{name,version}.

What's the intended use of c-memory-model? It tells the machine's word
size, but it combines this with a description of which C types are the
same size as a pointer. That's useful for C programs or a C FFI, but
it's irrelevant to Scheme, so it probably shouldn't be in this SRFI
(nor in R7RS-large). If it's intended to tell the size of a Scheme
pointer, it could be called word-size (or word-width?). If it's
intended to tell the machine's largest available pointer size (which
is often different, e.g. a 32-bit Scheme on a 64-bit machine) it could
be called cpu-word-size. Both of these should presumably return exact
integers.

...however, these aren't suitable for some of the most obvious reasons
to care about word size:
 * Determining fixnum range: use fixnum-width, least-fixnum and
greatest-fixnum instead.
 * Determining whether it's safe to use more than a few GB of memory:
word size is not the only constraint here, so this should really be
done with some sort of memory-limit or physical-memory function.
 * Estimating memory use: is a Scheme pointer 4 or 8 bytes? For this
purpose, you typically want the result in bytes, not bits, although
that doesn't make sense on some architectures. Also, per-object
overhead matters (especially if you use a lot of pairs or boxed
floats). So does the x2 multiplier for a simple copying GC.

Other environment inquiries that might be worth including:
 * username: mostly covered by (get-environment-variable "USER"), so
maybe a separate function is not necessary.
 * user-home-directory (or just user-home?): ~. On single-user
systems, this should be the usual place for user files, e.g. the root
of the main drive on Mac OS Classic. Already sort of covered by
(get-environment-variable "HOME").
 * settings-directory (user-settings-directory?): where programs
should save their user-specific settings. On Unix, this is the same as
user-home-directory; on Mac OS X and Windows, it's a separate
subdirectory. Should be provided in addition to user-home-directory
for portability.
 * current-directory (working-directory?): sometimes convenient for
pathname resolution. On systems that don't have this concept, return
the root or other default directory.
 * locale: already covered by SRFI-29, and by (get-environment-variable "LANG").
 * number-of-cpus (cpu-count?): to determine how many threads to use.
This really belongs in a multithreading SRFI, but SRFI-21 doesn't have
it.
 * cpu-clock, cpu-frequency, cpu-clock-{speed, frequency}: to estimate
available power, e.g. to decide whether to turn on expensive optional
features. Returns a frequency in hertz (or MHz?).
 * physical-memory: how much memory does the machine have? Returns an
exact integer in bytes, not Scheme pointers, because that's the
conventional unit and because large data is often not composed of
pointers. This might not fit in a fixnum, so maybe it should use a
larger unit than bytes.
 * cache-memory (cache-size?): how large is the cache, if any? This is
often more important for performance than physical-memory. When
there's more than one level of cache, it should return the one with
the largest effect (L2, on present-day machines).
 * memory-used: how much (virtual) memory is the Scheme implementation
currently using? (Including garbage: this is about the process, not
the GC.)
 * memory-limit: how much (virtual) memory can the Scheme
implementation use? This is a bit hard to implement, because it
depends on word size, available swap, and quotas (e.g. on Unix, the
soft limit from getrlimit(RLIMIT_AS)).
 * disk-{free,available}: how much disk space can the program use?
This is important to avoid filling up disks with ever-growing data
like logs or caches. With one argument, which must be a pathname,
return the available space for that path's drive; with zero arguments,
return the space for some default drive. Should ideally take the user
quota into account, if any. The name should maybe not include "disk",
since that's sometimes inaccurate, but I can't think of an alternative
that's as easy to understand ("drive" isn't, and isn't much more
accurate).
* stack-limit: approximately how deeply can we nest non-tail calls
without overflowing the stack? This is often an issue for functional
programs. I'm not sure if the unit should be bytes or calls; the
latter is more useful but unreliable, since stack-frame sizes may vary
a lot. Returns #f if there's no limit but available memory. Ideally
this should return the remaining depth, so recursive functions can
bail or change strategies when they get close to the limit.