Re: performance - Simplelists

Re: performance David Van Horn 18 Sep 2009 12:21 UTC
Taylor R Campbell wrote:
>    Date: Thu, 17 Sep 2009 20:53:55 -0400
>    From: David Van Horn <xxxxxx@cs.brandeis.edu>
>
>    Taylor R Campbell wrote:
>    > What space usage are random-access lists guaranteed to exhibit?
>    > Growth order and constant factors are both interesting -- growth order
>    > to specify in the SRFI, and constant factors to satisfy my curiosity.
>
>    Representing a random-access list of n elements takes O(n) space, just
>    as with sequential lists.
>
> I ought to have been more precise: while of course it takes O(n) space
> to store a sequence of n elements, I was interested more in the amount
> of extra space.  E.g., in most systems, a vector of length n will use
> a constant amount of extra space (a header with its length at the
> beginning), while a list of length n will use O(n) extra space (one
> car for each element, plus one cdr extra space for each element, and
> sometimes another word for a header or similar).

Yes, hence the following paragraph which gives the number of cons cells
need to represent a list of size n.

>    A random-access list is a forest of complete binary trees.  The forest
>    contains at most log n trees, and the space of each tree is proportional
>    to the number of elements it contains.  If you represent binary trees
>    using pairs [*], then it takes m-1 pairs to represent a complete binary
>    tree with m elements.  The forest can be represented as a list, and thus
>    takes a pair per tree, which is at most log n.  So the overall space
>    consumption need not be more than n + log n.  My implementation stores
>    the size of each tree in the forest making it n + 2 * log n.
>
> I'm a little puzzled by the formulae you've computed here.  If my
> cursory examination of the reference implementation is right, the
> trees have data associated with each internal node, not with the
> leaves.  So you need one location for each element, one location for
> each left branch, and one location for each right branch, for a total
> of 2 n extra locations in each tree, plus whatever constant overhead
> each tree has: + 1 for the size, + 1 for the node, + 1 for the link to
> the next tree?  This sounds like a total space usage of 3 n + 3 log n,
> neglecting the overhead of records, which means an extra space usage
> of 2 n + 3 log n = O(n).  I suppose I ought to have expected that,
> though.

The formula is given in terms of the number of pairs needed.  I figured
you could extrapolate from that.  Are you asking for something else?

You need not use records in an implementation, but there is no portable
way to define a pair type that is distinct from the built-in pair type.
  So if there is overhead for records, it can be avoided.

You are correct that the binary trees have data associated at each node
of the tree not just the leaves.  A tree of height 1 (a leaf) has no
overhead.  A tree of height 2 has 3 elements and requires 2 cons cells.
  A tree of height 3 has 7 elements and requires 6 cons cells.  In
general a tree of height h has (- (expt 2 h) 1) elements and requires (-
(expt 2 h) 2) cons cells.  Additionally, you need a list for the forest
and so a cons cell per tree, and my implementation stores the size, so
you need another cons cell per tree.  There are at most log n trees.
You can trade space for time in the case of the latter and compute the
size within the time bounds required in the SRFI.

In short, the space consumption has at worst a logarithmic overhead
compared to a plain ol' list.  Right?  I'm still not sure I'm answering
the question you're asking.

David