Re: Fwd: [srfi-126] Draft #2. (#1)
Takashi Kato
(10 Sep 2015 09:11 UTC)
|
Re: Fwd: [srfi-126] Draft #2. (#1) taylanbayirli@xxxxxx (10 Sep 2015 12:43 UTC)
|
Re: Fwd: [srfi-126] Draft #2. (#1)
John Cowan
(10 Sep 2015 13:25 UTC)
|
Re: Fwd: [srfi-126] Draft #2. (#1)
Takashi Kato
(10 Sep 2015 19:29 UTC)
|
Re: Fwd: [srfi-126] Draft #2. (#1)
taylanbayirli@xxxxxx
(10 Sep 2015 20:03 UTC)
|
Re: Fwd: [srfi-126] Draft #2. (#1)
John Cowan
(10 Sep 2015 21:10 UTC)
|
Re: Fwd: [srfi-126] Draft #2. (#1)
John Cowan
(10 Sep 2015 20:21 UTC)
|
Takashi Kato <xxxxxx@ymail.com> writes: > Couple of questions and suggestions: > > Questions: > - Calling hashtable-copy with weak hashtable. How should this behave when > GC is invoked? Should both hashtables keys/values which are not referred > any place other than these tables be collected? I'm not sure if I understand the question. Before the copy, you have one weak hashtable, after the copy you have two. And maybe you have a bunch of other weak hashtables somewhere in the program which also happen to hold some of the same keys and values. It doesn't matter in how many places you have weak references to an object. When there are no non-weak references to the object anywhere, the GC can reclaim it, and when it does so it should remove all entries of all hashtables in the program which had the reclaimed object as their key or value. Does that answer the question? Feel free to elaborate. Of course, in that explanation I'm assuming that GC happens either before or after the copy, i.e. the copy is atomic. That might not be the case in truth. An implementation should be careful so as not to screw up if the GC runs during some part of the copy operation. For instance: 1. a bucket-vector of the same size as the source hashtable's bucket vector is allocated, 2. this allocation triggers the GC, 3. the source hashtable's bucket vector gets shrinked and rehashed because some entries were removed due to being weak, 4. the algorithm naively tries to copy over entries from the source bucket vector which it thinks is longer than it really is now, 5. out-of-bounds error! A slightly different sequence of events may cause corrupted entries in the destination bucket vector instead of a crash, which would be even more nasty. Similar concerns apply to hashtable-for-each and the like. Multithreading / concurrent-GC might make things even more complicated. I might need to ponder on these problems to make sure the specification doesn't make anything overly difficult or impossible to implement efficiently... > - Also hashtable-copy. How should hashtables copied with different weakness > behave? For example, source weak hashtable A has weakness key and value, > and copied one B has only value, then when one of the keys of A is GCed > should B hold the key and associated value or the key should not be GCed > in first place? If the GC runs first, the stale entries of A are removed and don't even get into B; if the copy runs first, then all entries get into B, but if after that they still don't have any non-weak references to them, they then get collected. As long as ephemerons don't come into play, the semantics can be explained in terms of the number of non-weak references pointing to an object in the whole program at any point in time; the copying operation should be viewed as atomic. (Keeping up the illusion of that when it's not the case is the implementation's responsibility.) > Suggestions: > - hashtable-lookup: it would be convenient to have default value which can > be returned when the key is not found. (Then almost no difference between > hashtable-ref, though.) If you want to use a default value, you probably don't care whether the value you get was found in the table or is the default value. If you do for some reason, you can just default to that value "manually" after using `hashtable-lookup`: (let-values (((result found?) (hashtable-lookup ht key))) (if found? (use result) (begin (do something) (use default)))) `Default` appears late there. That should usually be fine, but it could also be let-bound earlier if desired for stylistic reasons: (let-values (((result found?) (hashtable-lookup ht key))) (if found? (use result) (let ((result default)) (do something) (use result)))) This makes the intent explicit and the code easy to read. If the default value mechanism were mixed together with the found/unfound mechanism, it would make the code harder to read IMO. > - hashtable-intern!: it would be convenient that default-proc takes the key > as its argument so that users can derive values from the key. In that case they can just let-bind the key (most often it will already be bound to a variable). If the one argument were forced, then users who want to use a nullary procedure would need to wrap it in a lambda to ignore the one argument. Do you have any data or experience on which use-case is more common? In the sources of MIT/GNU Scheme (search for hash-table/intern!), the nullary variant seems used very often, and in few cases the key (which is already bound to a variable) is used in the body of the default-proc which is a lambda. > - hashtable-for-each and hashtable-map!: what's the returning value of these > procedures? Given hashtable or an unspecified value? Unspecified, of course. Added to the spec. > - I think it would be convenient to have hashtable-map which almost the same > as hashtable-map! but returns newly created hashtable. The type of returning > hashtable can be in sense of hashtable-copy. So simple implementation would > be the following: > (define (hashtable-map proc hashtable) > (let ((r (hashtable-copy hashtable #t))) > (hashtable-map! proc r) > > r)) I like this, but do you have any data or experience on how often it's really used? I'd like to get a bit more input before adding another procedure. > Cheers, Thanks for the review! Looking forward to your reply, Taylan