Re: Withdrawn SRFI 250: Insertion-ordered hash tables

Show/hide message thread

Withdrawn SRFI 250: Insertion-ordered hash tables Arthur A. Gleckler (24 Sep 2024 19:31 UTC)

Re: Withdrawn SRFI 250: Insertion-ordered hash tables Shawn Wagner (25 Sep 2024 18:39 UTC)

Re: Withdrawn SRFI 250: Insertion-ordered hash tables Arthur A. Gleckler (25 Sep 2024 18:41 UTC)

Re: Withdrawn SRFI 250: Insertion-ordered hash tables Shawn Wagner (25 Sep 2024 21:35 UTC)

Re: Withdrawn SRFI 250: Insertion-ordered hash tables Daphne Preston-Kendal (25 Sep 2024 18:44 UTC)

Re: Withdrawn SRFI 250: Insertion-ordered hash tables Shawn Wagner (25 Sep 2024 21:34 UTC)

Re: Withdrawn SRFI 250: Insertion-ordered hash tables Daphne Preston-Kendal (25 Sep 2024 21:53 UTC)

Re: Withdrawn SRFI 250: Insertion-ordered hash tables Shawn Wagner (25 Sep 2024 22:43 UTC)

Re: Withdrawn SRFI 250: Insertion-ordered hash tables Daphne Preston-Kendal 25 Sep 2024 21:52 UTC

On 25 Sep 2024, at 23:33, Shawn Wagner <xxxxxx@gmail.com> wrote:

> By storing records with the data and next/previous pointers for a double linked list that's used to track order. New elements inserted into the table are appended to the end of that list. Seemed like the obvious approach.

Okay, that’s the approach that was used in Ruby 1.9 (except with a ‘normal’ hash table instead of a HAMT). It works, but the real magic is the approach the original sample implementation used: actual entries in a dynamically-sized array, and the hash table itself only points to a ‘sparse array’ whose elements are the indices of the corresponding actual entry within that dynamically-sized array. (‘Sparse array’ is a very bad name – it’s an integer array whose individual elements’ size increases according to the maximum number stored in the array, so 8 bits per element for up to 255 [already big enough for 99% of hash tables], 16 bits for up to 65k, etc.)

This is how very many major language runtimes now implement their hash tables, because it’s more memory efficient than the classical approach (because for a table with n buckets that only contains m entries, you have n - m wasted bytes/16-bit units, rather than that many wasted full pointers or pairs of pointers). The extra indirection doesn’t cost you much because the ‘sparse array’ is always warm in the CPU’s cache whenever the regular hash table is. So you get insertion ordering for less than the memory cost of a non-insertion-ordered hash table, and you pay nearly nothing extra in time either.

Ideally the hash table should also be O(1) not O(log n). But I don’t think this is a show stopper for a SRFI sample implementation.

Daphne