[cowan@mercury.ccil.org: Leftmost-longest behavior] John Cowan (27 Apr 2016 20:08 UTC)
Re: [cowan@mercury.ccil.org: Leftmost-longest behavior] Alex Shinn (27 Apr 2016 22:06 UTC)
Re: [cowan@mercury.ccil.org: Leftmost-longest behavior] Arthur A. Gleckler (27 Apr 2016 23:02 UTC)

Re: [cowan@mercury.ccil.org: Leftmost-longest behavior] Alex Shinn 27 Apr 2016 22:06 UTC

On Thu, Apr 28, 2016 at 5:08 AM, John Cowan <xxxxxx@mercury.ccil.org> wrote:
> The SRFI 115 sample implementation provides leftmost-longest behavior,
> but the SRFI does not specify it.

I can't remember now if this was intentional or not, but should have
stated so clearly in the SRFI.  The reason for leftmost-longest is
that it is POSIX and the only sensible specification anyway.  The
argument against is that many modern backtracking algorithms as
used in, e.g., Python and Perl do not follow these semantics, and
would have great difficulty doing so.

So leaving this unspecified is probably the right thing, but I think
including another feature to say whether POSIX semantics were
used would have been useful.  In the meantime it's simple enough
to test programmatically with your example - all of the backtracking
algorithms will get this wrong:

$ perl -e 'print join(" ","abcdefg"=~/a|bcdef|g|ab|c|d|e|efg|fg/g)."\n"'
a bcdef g

--
Alex

> This means that
>
> (regexp-partition '(or "a" "bcdef" "g" "ab" "c" "d" "e" "efg" "fg") "abcdefg")
>
> produces
>
> ("" "ab" "" "c" "" "d" "" "efg")
>
> whereas its Python analogue
>
> re.findall(r"(a|bcdef|g|ab|c|d|e|efg|fg)", "abcdefg")
>
> produces
>
> ['a', 'bcdef', 'g']
>
> The implementation's behavior agrees with egrep:
>
> $ echo 'abcdefg' | egrep -o '(a|bcdef|g|ab|c|d|e|efg|fg)'
> ab
> c
> d
> efg
>
> so it is not wrong, but it may be surprising.  (Example due to dpk.)
>
> --
> John Cowan          http://www.ccil.org/~cowan        xxxxxx@ccil.org
> Don't be so humble.  You're not that great.
>         --Golda Meir