Community preference so far? taylanbayirli@xxxxxx 26 Sep 2015 17:31 UTC
Re: Community preference so far? Alex Shinn 29 Sep 2015 02:43 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 29 Sep 2015 09:17 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 29 Sep 2015 11:00 UTC
Re: Community preference so far? Alex Shinn 30 Sep 2015 03:16 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 30 Sep 2015 09:37 UTC
Re: Community preference so far? Alex Shinn 30 Sep 2015 14:02 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 30 Sep 2015 20:44 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 01 Oct 2015 08:35 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 29 Sep 2015 11:36 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 01 Oct 2015 12:53 UTC
Re: Community preference so far? Alex Shinn 30 Sep 2015 03:32 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 30 Sep 2015 08:56 UTC
Re: Community preference so far? Alex Shinn 30 Sep 2015 09:38 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 30 Sep 2015 09:46 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 30 Sep 2015 10:03 UTC
Re: Community preference so far? Evan Hanson 30 Sep 2015 11:54 UTC
Re: Community preference so far? taylanbayirli@xxxxxx 30 Sep 2015 22:34 UTC
Re: Community preference so far? Per Bothner 29 Sep 2015 11:12 UTC
Re: Community preference so far? John Cowan 29 Sep 2015 12:07 UTC
Re: Community preference so far? Per Bothner 29 Sep 2015 12:47 UTC
Re: Community preference so far? Alex Shinn 30 Sep 2015 09:15 UTC

Re: Community preference so far? taylanbayirli@xxxxxx 30 Sep 2015 20:44 UTC

Alex Shinn <xxxxxx@gmail.com> writes:

> It's trivial to write a portable wrapper around either R6RS or
> SRFI-69 which ignores the salt. If you want any better compatibility
> than that I recommend using R6RS as is.

R6RS defines e.g.

    (string-hash string)

and if we now mandated it to take more arguments, we would be breaking
valid R6RS programs.  If we made the arguments optional, that would
yield some strange semantics...

Additionally, make-hash-table is specified to pass one argument to the
hash function it receives.  If we make it pass more arguments, again we
break valid R6RS programs.

Defining make-string-hash, and having the predefined string-hash be one
that was created with make-string-hash with a specific salt, is pretty
clean and has no such issues.

>     There's also the risk that programmers will be lazy and pass the
>     hash function the same salt every time, when they call them
>     themselves, thus doing more harm than good.
>
> The hash-table implementation passes the salt. This is written just
> once.

OK, I guess you could say the hash functions aren't ever meant to be
called by programmers themselves, except when using them as part of
extended hash functions where one would pass on the received salt...

>     I don't see the problem with an env var that affects both the
>     built-in hash functions and which users can obey in theirs?
>
> What is the env var? SRFI_126_HASH_SEED? How are you going to get both
> implementations and authors of portable hash functions
> (e.g. string-ni-hash) to respect this? Will it be specified in the
> SRFI?

Yes, I can specify the env var.  Then e.g. the built-in string-hash
would be the result of:

    (make-string-hash (get-environment-variable "SRFI_126_HASH_SEED"))

By the way I don't see how your proposal lifts the need for this.  The
hash table implementation passes the salt in your proposal; what salt
does it pass and how can I control it?  The seed for it will still
require the use of an env var from what I can tell.

I'll specify this env var in any case.  It appears useful either way.

>     A separate salt per table could be provided in the hash table
>     constructor, or hash function constructor, if really desired.
>
> The functions are fixed: string-hash, string-ci-hash, etc.  We don't
> have a make-string-ci-hash function. If we did, would we also
> recommend anyone who writes their own function to also provide a
> generator?

I was suggesting that we can define a make-string-ci-hash, and yes,
programmers would provide make-foo-hash.

>     Re. b., are there any known implementations doing that? It sounds
>     like an interesting idea, but I'd rather not complicate the API
>     for a use-case whose usefulness isn't demonstrated.
>
> The API is simpler and more elegant, given the alternative which is
> for everyone who wants to write their own hash function to manage
> their own seeds.

If I'm writing my own hash function from scratch, not calling any of the
built-in hash functions in it, then I may as well manage my own salting.

I don't see the proposed API being simpler or more elegant.  Probably a
subjective thing.  (Or maybe I don't get it yet.)

>     > 5. It allows automatically extending any hash function to become
>     > a set of hash functions by varying the salt.  In particular,
>     > this can be used for double hashing, cuckoo hashing, and bloom
>     > filters.
>
>     This too should be covered by hash function constructors if I'm
>     not mistaken, which is a less intrusive change to the API and can
>     be added post-hoc if really desired.
>
> The idea is to allow implementations of SRFI-12[56] to transparently
> make use of multiple hash functions, thus making double hashing and
> cuckoo hashing possible.  Since SRFI-12[56] hash table constructors
> only take a single hash function argument, this would otherwise not be
> possible.

OK, that seems to be a problem.

Backwards compatible solution: make-hash-table alternatively accepts a
pair of procedures for the hash-function argument, support make-foo-hash
constructors so users can easily create pairs of foo-hash functions.

This doesn't make it quite transparent, but it's a less intrusive
change, and more similar to what other hash table APIs out there do.

> Bloom filters admittedly would want for a separate API, which could
> take a generator or list of functions. Do we want to recommend
> everyone writing hash functions to provide generators as well? Do we
> want to provide system-defined generators for eq?/eqv? hashes? Oh
> wait, we can't do that...

If they want their hash functions usable with Bloom filters, sure, let
them provide what is necessary for that.

Let's please specify a sane and simple hash table API as soon as
possible, and leave Bloom filters, binary trees (motivation for SRFI-114
AFAIK), etc. for later.

> This proposal is all around more elegant, more secure, and more
> powerful than any I've seen. If I only cared about backwards
> compatibility I'd stick with SRFI-69, which is what all the chicken
> and chibi code I care about uses.

Obviously I don't *only* care about backwards compatibility, but I care
a lot about backwards compatibility especially in this case because we
already have two incompatible hash table APIs and having a third would
be a joke, unless there's a very very good reason for it.

> Re: the bound argument, it's a promise that we don't need values
> larger than the bound, allowing the hash function to use fixnum-only
> arithmetic in most cases. It applies not just in special cases like
> disk-based tables, but when the hash table consumes a large enough
> fraction of memory to want a bignum number of buckets. As discussed on
> the SRFI-125 list, this can happen on 32-bit architectures with 3 tag
> bits when the table takes around 1/4 of available memory. It's
> realistic to want a small server where the table takes up nearly 100%
> of memory.

OK, how about passing hash functions a second bound argument only when
the size of the table grows very big, like in SRFI-69?  This way
*almost* all R6RS programs keep working, and we still solve the issue.

To summarize:

- I will specify SRFI_126_HASH_SEED which should be obeyed by the
  built-in hash functions.

This covers testing purposes.

- I will specify make-hashtable to accept a pair of procedures in place
  of the hash-function argument, and specify hash function constructors.

This covers hashing strategies involving two hash functions, and
different salts per table.

- I will specify that the hash function passed to make-hashtable *may*
  be passed a second bound argument, and the hash functions will take
  optional bound arguments.

This covers the gigantic tables use-case.

I'm not entirely happy about the last one, but can't think of a better
solution.  The other two I'm fairly happy about, because they're very
non-intrusive changes and cleanly solve our problems.

Taylan