Another library (SRFI?) for pattern matching Sergei Egorov (06 Dec 2024 04:15 UTC)
Re: Another library (SRFI?) for pattern matching Daphne Preston-Kendal (07 Dec 2024 14:36 UTC)
Re: Another library (SRFI?) for pattern matching Sergei Egorov (07 Dec 2024 21:59 UTC)

Re: Another library (SRFI?) for pattern matching Sergei Egorov 07 Dec 2024 21:59 UTC

Dear Daphne,

Thanks for pointing me to your work, it is quite impressive. I have to spend more time to study it, but on a surface I can see both
similarities and dissimilarities. The most striking dissimilarity is that yours relies on identifier properties (SRFI 213), and thus
is dependent on synax-case -like expander. I understand that for many people syntax-case, with all related phasing machinery, is a
natural part of any "modern" Scheme, but I belong to the "small Scheme" school of thought that believes that syntax-rules -only
expander is good enough for most purposes. In my opinion, both small and large Scheme proponents deserve an extendable match
library, and mine is for the former crowd. I believe it can be implemented on any Scheme with a correct R4RS+ -compatible expander,
being purely syntax-rules -based. Both types of expanders have purpose in life, so this is not a critique of yours, which is indeed
well thought-out.

Answering some of your questions:

The specification for ~append pattern explicitly specifies that it is (iterative, greedy), and says "The solutions are tried
starting with ones maximizing the length of the leftmost segment, then the segment after it, etc.". It is indeed greedy, unlike
~append/ng, which is explicitly non-greedy.

The patterns requiring backtracking (meaning that they may introduce backtracking/choice points by themselves, in addition to
whatever their sub-patterns may do), are marked as (iterative) in their prototype lines. They are these: all ~appends except for
~append/t, ~or, ~iterate and its derivatives. Any sub-pattern may introduce its own backtracking/choice points, and its parent
pattern(s) do not prevent them from doing their work; the sample implementation contains a cut-like ~! pattern that does, but it is
experimental and is not listed in the specification.

I am for an efficient implementation, as long as it does not introduce a weirdly limited semantics. Compiling to DFA/NFA is
certainly a good idea, but probably too complicated for small systems I chose as my target.

No, all all sequences do not have to be converted to lists to be matched; there is enough core functionality to implement them
directly, using only functionality made public in the specification. Not all of possible variants are predefined, but this may be
changed.

Non-linear patterns require a bit of extra work, yes, but, in the end, they aren't so complicated. I made them work with sequence
(~etc) patterns with little effort. 'Not' is an outlier indeed, but not in this respect only. Anyway, if a user does not repeat
pattern variables, no extra code is generated, so it's not a feature affecting all scenarios. At least that's what I see in my
"small Scheme" world, and, since one of the matchers I had to re-implement on top of mine uses nonlinearity extensively, I thought
that supporting it would be a plus.

I haven't read Christian Queinnec’s paper, thanks for pointing to it. Thanks for your note on "WCS" name, I will follow your advice.
I hope you won't complain if I steal some names and ideas from your design (with proper attribution of course) -- with a goal to
make them more compatible in areas where their functionalities overlap. E.g. 'lset' is indeed a better name than list-no-order (BTW:
don't you have to have lset* to distinguish the one with the tail pattern?)

I do not object your including mine as a candidate for the Batteries, but perhaps it is better to have it a run through SRFI process
first, if no one objects.

Best Regards,
--Sergei

-----Original Message-----
From: Daphne Preston-Kendal
Sent: Saturday, December 07, 2024 9:36 AM
To: srfi-discuss
Subject: Re: Another library (SRFI?) for pattern matching

Dear Sergei,

I have been working on a macro-extensible pattern matching library on-and-off for over a year now. You can read my pre-SRFI at
<https://codeberg.org/dpk/extensible-match>. It has lots of historical context and design notes, with more to come. I am currently
working on an optimizer for the back end of the pattern compiler, and once that is done, I will submit it as a SRFI.

Some comments:

In one place you say your Kleene star equivalent has leftmost-longest semantics (under the ‘Backtracking’ heading), in another it
says it’s unspecified (in the specification of the ~append pattern). (‘Leftmost-longest’ vs ‘leftmost-greedy’ (the usual opposition)
is moot when every rule is only one token long anyway, but maybe this is not quite the case with how your ~etc and ~append and other
patterns can be composed. That’s not exactly clear to me from reading so far.)

It’s also not clear to me what features actually require backtracking. I have implemented (ordered) sequence patterns as tagged NFA
simulations without backtracking. Later (probably after the SRFI is finalized) I intend to convert the NFAs to DFAs in most cases,
since (looking at how sequence patterns are used in practice, using GitHub Code Search) most uses can either be seen as a
declarative way of writing of ‘map’ or, occasionally, ‘span’/‘break’, and a DFA compilation for those specific uses should not cause
a huge explosion in the number of states, and eliminate the run-time overhead of the NFA simulation. I think the same techniques can
be used for your patterns.

Looks like all sequences have to be converted to lists to be matched? That’s inefficient, especially for strings. I think it’s
better to leave string pattern matching to SRFI 115 and real parsing libraries. While pattern vars in sequences end up being bound
as lists anyway, I have allowed users to define their own sequence iteration protocols, so that e.g. vectors, rlists (SRFI 101),
finger trees, hash tables, etc., don’t have to be converted to lists, even if their elements ultimately get gathered into another
new list by the matcher.

Non-linear patterns don’t compose well with sequence patterns or ‘not’ patterns or ‘or’ patterns, as explained at too much length in
my pre-SRFI. They add a lot of complication for little benefit. Wright’s original matcher didn’t support them, nor does Adam Nelson’s
implementation of it. Only the Racket implementation and Alex Shinn’s implementation support it, and it’s my impression from talking
to Sam Tobin-Hochstadt that he considers non-linear patterns to have been a mistake and wishes he didn’t have to support them. (He
inherited them from Bruce Hauman who wrote the original version for PLT Scheme, as it was then.)

If you do want to support non-linear patterns, I highly recommend reading Christian Queinnec’s paper on his pattern compiler design,
which actually seems quite close to yours (by accident, I assume, since you don’t cite him):
<https://christian.queinnec.org/PDF/csm.pdf>

A minor thing: The name ‘WCS’ is wrong, because Robert Cartwright had almost nothing to do with that pattern matcher (but Bruce F.
Duba did). I suggest simply calling it a ‘Wright-style’ pattern matcher.

I will add your pre-SRFI to the list of potential candidates for an R7RS Large (Batteries) pattern matching library, unless you
object to that.

Daphne

> On 6 Dec 2024, at 05:14, Sergei Egorov <xxxxxx@acm.org> wrote:
>
> Dear Schemers,
>
> I have an idea for a new design for a pattern matching library. This library provides all common-denominator functionality
> described in SRFI-200, adding on top of it support for non-linear patterns and backtracking. Also, the proposed design is simple,
> modular, and allows for convenient extension via the define-match-pattern mechanism. Other pattern matchers can be implemented on
> top of the proposed abstractions, some in just a dozen lines of syntax-rules source code.
>
> http://malgil.com/esl/srfi/srfi-xm11.html
>
> I have used it in a few projects of my own, mainly to re-implement two existing matchers, with this one used as a common platform
> for both re-implementations. I would appreciate your feedback and comments -- perhaps someone else is working on a similar
> project? Is this something you would like to see formally submitted as a SRFI?
>
> Best regards,
> --Sergei
>
>
>
>
>