Re: Complexity of seq & related patterns

Show/hide message thread

Complexity of seq & related patterns Wolfgang Corcoran-Mathe (05 Nov 2025 19:37 UTC)

Re: Complexity of seq & related patterns Daphne Preston-Kendal (05 Nov 2025 20:05 UTC)

Re: Complexity of seq & related patterns Daphne Preston-Kendal (06 Nov 2025 20:07 UTC)

Re: Complexity of seq & related patterns Wolfgang Corcoran-Mathe (07 Nov 2025 18:22 UTC)

Re: Complexity of seq & related patterns Daphne Preston-Kendal (25 Nov 2025 10:37 UTC)

Re: Complexity of seq & related patterns Wolfgang Corcoran-Mathe (07 Jan 2026 22:49 UTC)

Re: Complexity of seq & related patterns Daphne Preston-Kendal 25 Nov 2025 10:37 UTC

On 7 Nov 2025, at 19:22, Wolfgang Corcoran-Mathe <xxxxxx@sigwinch.xyz> wrote:

> What bothers me, as I tried to express, is the mix of declarative
> (pattern-matching) & procedural (loop) idiom.  If I've understood
> correctly, 'seq' uses a description of an iterative process to construct
> an abstract sequence whose elements are matched against the <sequence
> pattern>s.  This avoids reifying new sequence types as old sequence
> types, which is a win for extending the matcher efficiently.
>
> But there are several ways to describe an abstract sequence.
> SRFI 257's analogous '~iterate' pattern uses something like an
> unfold.  The Haskell literature is full of ways to use coalgebras and
> fixpoints to describe the construction of inductively-defined types.
> (I would need Marc's help to make sense of much of that.)

Thanks; I understand the objection better now!

I think I have an idea to resolve this issue.

Instead of making ‘seq’, ‘seq*’, and ‘seq/unordered’ explicitly exposed patterns, I will create transformer-creating procedures which create transformers for each of these types. So instead of writing:

(define-pattern-syntax vector
  (syntax-rules ()
    ((_ subpat ...)
     (? vector?
        (seq vec ((idx 0 (+ idx 1)))
             (>= idx (vector-length vec))
             (vector-ref vec idx)
          subpat ...)))))

you would write something like:

(define-pattern-syntax vector
  (seq-pattern-transformer vector? vec
                           ((idx 0 (+ idx 1)))
                           (>= idx (vector-length vec))
                           (vector-ref vec idx)))

This also has the advantage that it will in theory be easier to optimize adjacent seq patterns of the same type – the matcher can recognize pattern syntax bound to the same transformer and take the union of their regular expressions.

I think it might be possible to improve on this even further, actually; more below.

> In more
> down-to-earth terms, we have indexed sequence types like vectors &
> inductively-defined sequences like lists; is it a good idea to use
> the same 'do'-like description for both varieties?  (I don't know.)

I think so; the vector style is arguably also inductively defined: the base case is Zero and the inductive case is Succ :-) We just happen to write Zero as 0 and Succ as (+ idx 1). More practically, this is how Schemers are used to writing their loops. (Usually with named let rather than do, but the basic idea is the same.) It’s maybe not the best way to write a loop (vide Olin’s paper) but it’s familiar and flexible for many data types.

What I’m more concerned about is data structures where iteration requires traversing a tree. On RRB-trees or Hickey persistent vectors (to take two popular examples where the sequence is still ordered, thus avoiding the inherent performance question marks of seq/unordered), the index style seems natural but is O(n log n) rather than O(n). The alternative is to create a cursor type which maintains an explicit stack, which generates (I think) O(n log n) garbage although actual run time ignoring garbage would be O(n). Other types like ralists (SRFI 101) or Guile’s vlists will also generate close to O(n) garbage due to cdr being a consing operating with those types.

In terms of recursion schemes, in theory a seq pattern is just a left fold; a seq* pattern is just a left fold over successive tails instead of over successive heads. Using fold would let implementations recurse as they wish, but the problem with using fold is that matching a seq pattern requires a lot more state than just the single accumulator variable. Of course it could use a list or something to maintain a lot of state, but that would also mean generating a lot of garbage.

As you know I’ve been working on a general solution to iteration, including for tree-based structures, based on a variant of fold that uses mutual recursion (for others: <https://codeberg.org/dpk/presrfis/src/branch/master/gold.md>) but how much garbage *that* generates when run on a real Scheme implementation remains to be seen. It’s definitely not ready to ship in SRFI 262 yet.

So maybe in fact what it needs is for seq-pattern-transformer to expand directly into Scheme code, allowing the code generated by the transformer to drive the loop. Backtracking would be by continuation capture. For the sample implementation, this is only tricky for seq/unordered; the others don’t backtrack. I tried to quickly sketch the code for this (like the above example) but got stuck on how to avoid the need for the pattern matcher to use side-effects. Delimited continuations would make this a heckuva lot easier, but I can’t yet depend on those being available in the Scheme implementation. (In concrete terms considering the needs of the sample implementation, Guile has them, Chez doesn’t yet, at least not without the metacontinuation-maintaining hack Marc used in SRFI 226’s sample implementation.)

Daphne