Re: Weakness of "non-object" types taylanbayirli@xxxxxx 04 Dec 2015 16:41 UTC

John Cowan <xxxxxx@mercury.ccil.org> writes:

> Taylan Ulrich Bayırlı/Kammer scripsit:
>
>> Booleans, characters, numbers, symbols, and the empty list object are
>> never stored weakly or ephemerally.
>
> I think it's bad to include numbers in this list.  The whole point of
> non-strong hashtables is to allow things to be garbage collected that
> you don't care about any more.  Booleans, characters, and the empty list
> are in practice represented either by immediates or by singletons, and in
> many Schemes, symbols are the values in a hidden strong hash table that
> maps strings to symbols.  (For standards purposes, uninterned symbols
> don't count as symbols).  So they will never be reclaimed anyway.
>
> But numbers are potentially unbounded in size, specifically bignums
> and ratnums or rectnums made from bignums.  Not allowing them to be
> held weakly means that they cannot easily be reclaimed when stored as
> inaccessible values in a hash table.  That leads to the memory exhaustion
> that non-strong hash tables are designed to prevent.
>
> Given that, I question the utility of pointing all this out, since it
> is an implementation detail for the types other than numbers, and is
> broken for numbers.

(As far as I know, eq? is ill-defined on characters as well as numbers
because characters too sometimes end up being non-immediate.  Anyway,
whether characters fall in the same category doesn't matter much.)

Thing is, we have no proof that the programmer doesn't care about a
certain number anymore (or character, symbol, ...).  A number is just a
number, no matter where and when and how many times it's been allocated
and deallocated.  Its representation in memory may die, but the number
never dies, and never becomes unreachable in the true sense, since it
can always pop up again as the result of some computation, gaining a new
representation in memory.

The RnRS go as far as saying there is no such thing as deallocation in
Scheme.  ("All objects created in the course of a Scheme computation
[...] have unlimited extent.  No Scheme object is ever destroyed.")
Garbage collection is an implementation detail which doesn't break that
abstraction.  Deallocating and reallocating numbers is fine because
they're immutable; the programmer can't tell the new instance apart from
the old.  Otherwise, only non-recreatable objects are deallocated once
no references remain to them, meaning they become truly unreachable.

Now weak tables (and weak references generally) are supposed to build
upon this same abstraction, and keep up the illusion that there is no
such thing as deallocation.  Allowing number associations in hash tables
to be dropped on GC breaks that illusion.

I'm missing an example of what real problems this abstraction breach
causes.  I could swear I saw a related bug report or discussion in Guile
once, but I can't find it right now.  Will look around more.

>> The implementation will GC a symbol when there are no strong references
>> left to it.
>
> Not necessarily, no.

Right, only some implementations collect symbols.

By the way, MIT/GNU Scheme already says eqv hash tables keep numbers and
characters strongly.  Since these are ill-defined with eq, the new prose
in SRFI-126 doesn't make a difference there, i.e. still only enforces
MIT/GNU Scheme semantics on all SRFI-126 implementations.  (Of course
one might still object to that, but at least the decision has some
history behind it.)

The significant difference is in symbols, for implementations that GC
them.  (I'm assuming there are no serious implementations that have
non-immediate Booleans or empty list.)  It's probably best to be
consistent here, and include symbols in the list if the list is there at
all.

I'll look around some more and either use the current semantics or drop
it entirely (not enforcing MIT/GNU Scheme semantics at all)...  More
opinions welcome.

Taylan