I propose to integrate PCREs by having the same API work for both. The
grammer for SREs would be changed to require that they are lists. This
would remove the ambiguity: strings are PCREs and lists are SREs.
<sre> ::=
| <cset-sre> ; A character set match.
| <outer-sre>
<outer-sre> ::=
| (* <inner-sre> ...) ; 0 or more matches.
| (+ <inner-sre> ...) ; 1 or more matches.
| (? <inner-sre> ...) ; 0 or 1 matches.
| (= <n> <inner-sre> ...) ; <n> matches.
| (>= <n> <inner-sre> ...) ; <n> or more matches.
| (** <n> <m> <inner-sre> ...) ; <n> to <m> matches.
| (| <inner-sre> ...) ; Alternation.
| (or <inner-sre> ...)
| (: <inner-sre> ...) ; Sequence.
| (seq <inner-sre> ...)
| ($ <inner-sre> ...) ; Numbered submatch.
| (submatch <inner-sre> ...)
| (=> <name> <inner-sre> ...) ; Named submatch.
<name> is
| (submatch-named <name> <inner-sre> ...) ; a symbol.
| (w/case <inner-sre> ...) ; Introduce a case-sensitive
context.
| (w/nocase <inner-sre> ...) ; Introduce a case-insensitive
context.
| (w/unicode <inner-sre> ...) ; Introduce a unicode context.
| (w/ascii <inner-sre> ...) ; Introduce an ascii context.
| (word <inner-sre> ...) ; A sre wrapped in word
boundaries.
| (word+ <inner-cset-sre> ...) ; A single word restricted to
a cset.
| word ; A single word.
| (?? <inner-sre> ...) ; A non-greedy pattern, 0 or
1 match.
| (*? <inner-sre> ...) ; Non-greedy 0 or more matches.
| (**? m n <inner-sre> ...) ; Non-greedy <m> to <n> matches.
| (look-ahead <inner-sre> ...) ; Zero-width look-ahead
assertion.
| (look-behind <inner-sre> ...) ; Zero-width look-behind
assertion.
| (neg-look-ahead <inner-sre> ...) ; Zero-width negative
look-ahead assertion.
| (neg-look-behind <inner-sre> ...) ; Zero-width negative
look-behind assertion.
<inner-sre> ::=
| <outer-sre>
| <inner-cset-sre>
| <string> ; A literal string match.
| bos ; Beginning of string.
| eos ; End of string.
| bol ; Beginning of line.
| eol ; End of line.
| bog ; Beginning of grapheme cluster.
| eog ; End of grapheme cluster.
| graheme ; A single grapheme cluster.
| bow ; Beginning of word.
| eow ; End of word.
| nwb ; A non-word boundary.
<cset-sre> ::=
| (<string>) ; literal char set
| (/ <range-spec> ...) ; ranges
| (or <inner-cset-sre> ...) ; union
| (and <inner-cset-sre> ...) ; intersection
| (- <inner-cset-sre> ...) ; difference
| (~ <inner-cset-sre> ...) ; complement of union
| (w/case <inner-cset-sre> ...) ; case and unicode toggling
| (w/nocase <inner-cset-sre> ...)
| (w/ascii <inner-cset-sre> ...)
| (w/unicode <inner-cset-sre> ...)
<inner-cset-sre> ::=
| <cset-sre>
| <char> ; literal char
| "<char>" ; string of one char
| <char-set> ; embedded SRFI 14 char set
| any | nonl | ascii | lower-case | lower
| upper-case | upper | alphabetic | alpha
| numeric | num | alphanumeric | alphanum | alnum
| punctuation | punct | symbol | graphic | graph
| whitespace | white | space | printing | print
| control | cntrl | hex-digit | xdigit
<range-spec> ::= <string> | <char>