Leftmost-longest behavior
John Cowan 27 Apr 2016 20:04 UTC
The SRFI 115 sample implementation provides leftmost-longest behavior,
but the SRFI does not specify it. This means that
(regexp-partition '(or "a" "bcdef" "g" "ab" "c" "d" "e" "efg" "fg") "abcdefg")
produces
("" "ab" "" "c" "" "d" "" "efg")
whereas its Python analogue
re.findall(r"(a|bcdef|g|ab|c|d|e|efg|fg)", "abcdefg")
produces
['a', 'bcdef', 'g']
The implementation's behavior agrees with egrep:
$ echo 'abcdefg' | egrep -o '(a|bcdef|g|ab|c|d|e|efg|fg)'
ab
c
d
efg
so it is not wrong, but it may be surprising. (Example due to dpk.)
--
John Cowan http://www.ccil.org/~cowan xxxxxx@ccil.org
Don't be so humble. You're not that great.
--Golda Meir