Dear Sergei,
I have been working on a macro-extensible pattern matching library on-and-off for over a year now. You can read my pre-SRFI at <https://codeberg.org/dpk/extensible-match>. It has lots of historical context and design notes, with more to come. I am currently working on an optimizer for the back end of the pattern compiler, and once that is done, I will submit it as a SRFI.
Some comments:
In one place you say your Kleene star equivalent has leftmost-longest semantics (under the ‘Backtracking’ heading), in another it says it’s unspecified (in the specification of the ~append pattern). (‘Leftmost-longest’ vs ‘leftmost-greedy’ (the usual opposition) is moot when every rule is only one token long anyway, but maybe this is not quite the case with how your ~etc and ~append and other patterns can be composed. That’s not exactly clear to me from reading so far.)
It’s also not clear to me what features actually require backtracking. I have implemented (ordered) sequence patterns as tagged NFA simulations without backtracking. Later (probably after the SRFI is finalized) I intend to convert the NFAs to DFAs in most cases, since (looking at how sequence patterns are used in practice, using GitHub Code Search) most uses can either be seen as a declarative way of writing of ‘map’ or, occasionally, ‘span’/‘break’, and a DFA compilation for those specific uses should not cause a huge explosion in the number of states, and eliminate the run-time overhead of the NFA simulation. I think the same techniques can be used for your patterns.
Looks like all sequences have to be converted to lists to be matched? That’s inefficient, especially for strings. I think it’s better to leave string pattern matching to SRFI 115 and real parsing libraries. While pattern vars in sequences end up being bound as lists anyway, I have allowed users to define their own sequence iteration protocols, so that e.g. vectors, rlists (SRFI 101), finger trees, hash tables, etc., don’t have to be converted to lists, even if their elements ultimately get gathered into another new list by the matcher.
Non-linear patterns don’t compose well with sequence patterns or ‘not’ patterns or ‘or’ patterns, as explained at too much length in my pre-SRFI. They add a lot of complication for little benefit. Wright’s original matcher didn’t support them, nor does Adam Nelson’s implementation of it. Only the Racket implementation and Alex Shinn’s implementation support it, and it’s my impression from talking to Sam Tobin-Hochstadt that he considers non-linear patterns to have been a mistake and wishes he didn’t have to support them. (He inherited them from Bruce Hauman who wrote the original version for PLT Scheme, as it was then.)
If you do want to support non-linear patterns, I highly recommend reading Christian Queinnec’s paper on his pattern compiler design, which actually seems quite close to yours (by accident, I assume, since you don’t cite him): <https://christian.queinnec.org/PDF/csm.pdf>
A minor thing: The name ‘WCS’ is wrong, because Robert Cartwright had almost nothing to do with that pattern matcher (but Bruce F. Duba did). I suggest simply calling it a ‘Wright-style’ pattern matcher.
I will add your pre-SRFI to the list of potential candidates for an R7RS Large (Batteries) pattern matching library, unless you object to that.
Daphne
> On 6 Dec 2024, at 05:14, Sergei Egorov <xxxxxx@acm.org> wrote:
>
> Dear Schemers,
>
> I have an idea for a new design for a pattern matching library. This library provides all common-denominator functionality described in SRFI-200, adding on top of it support for non-linear patterns and backtracking. Also, the proposed design is simple, modular, and allows for convenient extension via the define-match-pattern mechanism. Other pattern matchers can be implemented on top of the proposed abstractions, some in just a dozen lines of syntax-rules source code.
>
> http://malgil.com/esl/srfi/srfi-xm11.html
>
> I have used it in a few projects of my own, mainly to re-implement two existing matchers, with this one used as a common platform for both re-implementations. I would appreciate your feedback and comments -- perhaps someone else is working on a similar project? Is this something you would like to see formally submitted as a SRFI?
>
> Best regards,
> --Sergei
>
>
>
>
>