Re: Fwd: [srfi-126] Draft #2. (#1) Takashi Kato (10 Sep 2015 09:11 UTC)
Re: Fwd: [srfi-126] Draft #2. (#1) taylanbayirli@xxxxxx (10 Sep 2015 12:43 UTC)
Re: Fwd: [srfi-126] Draft #2. (#1) John Cowan (10 Sep 2015 13:25 UTC)
Re: Fwd: [srfi-126] Draft #2. (#1) Takashi Kato (10 Sep 2015 19:29 UTC)
Re: Fwd: [srfi-126] Draft #2. (#1) taylanbayirli@xxxxxx (10 Sep 2015 20:03 UTC)
Re: Fwd: [srfi-126] Draft #2. (#1) John Cowan (10 Sep 2015 21:10 UTC)
Re: Fwd: [srfi-126] Draft #2. (#1) John Cowan (10 Sep 2015 20:21 UTC)

Re: Fwd: [srfi-126] Draft #2. (#1) taylanbayirli@xxxxxx 10 Sep 2015 12:43 UTC

Takashi Kato <xxxxxx@ymail.com> writes:

> Couple of questions and suggestions:
>
> Questions:
> - Calling hashtable-copy with weak hashtable. How should this behave when
>   GC is invoked? Should both hashtables keys/values which are not referred
>   any place other than these tables be collected?

I'm not sure if I understand the question.  Before the copy, you have
one weak hashtable, after the copy you have two.  And maybe you have a
bunch of other weak hashtables somewhere in the program which also
happen to hold some of the same keys and values.  It doesn't matter in
how many places you have weak references to an object.  When there are
no non-weak references to the object anywhere, the GC can reclaim it,
and when it does so it should remove all entries of all hashtables in
the program which had the reclaimed object as their key or value.

Does that answer the question?  Feel free to elaborate.

Of course, in that explanation I'm assuming that GC happens either
before or after the copy, i.e. the copy is atomic.  That might not be
the case in truth.  An implementation should be careful so as not to
screw up if the GC runs during some part of the copy operation.  For
instance: 1. a bucket-vector of the same size as the source hashtable's
bucket vector is allocated, 2. this allocation triggers the GC, 3. the
source hashtable's bucket vector gets shrinked and rehashed because some
entries were removed due to being weak, 4. the algorithm naively tries
to copy over entries from the source bucket vector which it thinks is
longer than it really is now, 5. out-of-bounds error!  A slightly
different sequence of events may cause corrupted entries in the
destination bucket vector instead of a crash, which would be even more
nasty.

Similar concerns apply to hashtable-for-each and the like.

Multithreading / concurrent-GC might make things even more complicated.

I might need to ponder on these problems to make sure the specification
doesn't make anything overly difficult or impossible to implement
efficiently...

> - Also hashtable-copy. How should hashtables copied with different weakness
>   behave? For example, source weak hashtable A has weakness key and value,
>   and copied one B has only value, then when one of the keys of A is GCed
>   should B hold the key and associated value or the key should not be GCed
>   in first place?

If the GC runs first, the stale entries of A are removed and don't even
get into B; if the copy runs first, then all entries get into B, but if
after that they still don't have any non-weak references to them, they
then get collected.

As long as ephemerons don't come into play, the semantics can be
explained in terms of the number of non-weak references pointing to an
object in the whole program at any point in time; the copying operation
should be viewed as atomic.  (Keeping up the illusion of that when it's
not the case is the implementation's responsibility.)

> Suggestions:
> - hashtable-lookup: it would be convenient to have default value which can
>   be returned when the key is not found. (Then almost no difference between
>   hashtable-ref, though.)

If you want to use a default value, you probably don't care whether the
value you get was found in the table or is the default value.  If you do
for some reason, you can just default to that value "manually" after
using `hashtable-lookup`:

    (let-values (((result found?) (hashtable-lookup ht key)))
      (if found?
          (use result)
          (begin
            (do something)
            (use default))))

`Default` appears late there.  That should usually be fine, but it could
also be let-bound earlier if desired for stylistic reasons:

    (let-values (((result found?) (hashtable-lookup ht key)))
      (if found?
          (use result)
          (let ((result default))
            (do something)
            (use result))))

This makes the intent explicit and the code easy to read.  If the
default value mechanism were mixed together with the found/unfound
mechanism, it would make the code harder to read IMO.

> - hashtable-intern!: it would be convenient that default-proc takes the key
>   as its argument so that users can derive values from the key.

In that case they can just let-bind the key (most often it will already
be bound to a variable).  If the one argument were forced, then users
who want to use a nullary procedure would need to wrap it in a lambda to
ignore the one argument.  Do you have any data or experience on which
use-case is more common?  In the sources of MIT/GNU Scheme (search for
hash-table/intern!), the nullary variant seems used very often, and in
few cases the key (which is already bound to a variable) is used in the
body of the default-proc which is a lambda.

> - hashtable-for-each and hashtable-map!: what's the returning value of these
>   procedures? Given hashtable or an unspecified value?

Unspecified, of course.  Added to the spec.

> - I think it would be convenient to have hashtable-map which almost the same
>   as hashtable-map! but returns newly created hashtable. The type of returning
>   hashtable can be in sense of hashtable-copy. So simple implementation would
>   be the following:
>   (define (hashtable-map proc hashtable)
>     (let ((r (hashtable-copy hashtable #t)))
>       (hashtable-map! proc r)
>
>       r))

I like this, but do you have any data or experience on how often it's
really used?  I'd like to get a bit more input before adding another
procedure.

> Cheers,

Thanks for the review!  Looking forward to your reply,
Taylan