integrating PCREs Michael Montague (26 Nov 2013 03:30 UTC)
|
Re: integrating PCREs
Alex Shinn
(26 Nov 2013 12:42 UTC)
|
Re: integrating PCREs
Michael Montague
(26 Nov 2013 17:53 UTC)
|
Re: integrating PCREs
Alex Shinn
(26 Nov 2013 21:51 UTC)
|
integrating PCREs Michael Montague 26 Nov 2013 03:30 UTC
I propose to integrate PCREs by having the same API work for both. The grammer for SREs would be changed to require that they are lists. This would remove the ambiguity: strings are PCREs and lists are SREs. <sre> ::= | <cset-sre> ; A character set match. | <outer-sre> <outer-sre> ::= | (* <inner-sre> ...) ; 0 or more matches. | (+ <inner-sre> ...) ; 1 or more matches. | (? <inner-sre> ...) ; 0 or 1 matches. | (= <n> <inner-sre> ...) ; <n> matches. | (>= <n> <inner-sre> ...) ; <n> or more matches. | (** <n> <m> <inner-sre> ...) ; <n> to <m> matches. | (| <inner-sre> ...) ; Alternation. | (or <inner-sre> ...) | (: <inner-sre> ...) ; Sequence. | (seq <inner-sre> ...) | ($ <inner-sre> ...) ; Numbered submatch. | (submatch <inner-sre> ...) | (=> <name> <inner-sre> ...) ; Named submatch. <name> is | (submatch-named <name> <inner-sre> ...) ; a symbol. | (w/case <inner-sre> ...) ; Introduce a case-sensitive context. | (w/nocase <inner-sre> ...) ; Introduce a case-insensitive context. | (w/unicode <inner-sre> ...) ; Introduce a unicode context. | (w/ascii <inner-sre> ...) ; Introduce an ascii context. | (word <inner-sre> ...) ; A sre wrapped in word boundaries. | (word+ <inner-cset-sre> ...) ; A single word restricted to a cset. | word ; A single word. | (?? <inner-sre> ...) ; A non-greedy pattern, 0 or 1 match. | (*? <inner-sre> ...) ; Non-greedy 0 or more matches. | (**? m n <inner-sre> ...) ; Non-greedy <m> to <n> matches. | (look-ahead <inner-sre> ...) ; Zero-width look-ahead assertion. | (look-behind <inner-sre> ...) ; Zero-width look-behind assertion. | (neg-look-ahead <inner-sre> ...) ; Zero-width negative look-ahead assertion. | (neg-look-behind <inner-sre> ...) ; Zero-width negative look-behind assertion. <inner-sre> ::= | <outer-sre> | <inner-cset-sre> | <string> ; A literal string match. | bos ; Beginning of string. | eos ; End of string. | bol ; Beginning of line. | eol ; End of line. | bog ; Beginning of grapheme cluster. | eog ; End of grapheme cluster. | graheme ; A single grapheme cluster. | bow ; Beginning of word. | eow ; End of word. | nwb ; A non-word boundary. <cset-sre> ::= | (<string>) ; literal char set | (/ <range-spec> ...) ; ranges | (or <inner-cset-sre> ...) ; union | (and <inner-cset-sre> ...) ; intersection | (- <inner-cset-sre> ...) ; difference | (~ <inner-cset-sre> ...) ; complement of union | (w/case <inner-cset-sre> ...) ; case and unicode toggling | (w/nocase <inner-cset-sre> ...) | (w/ascii <inner-cset-sre> ...) | (w/unicode <inner-cset-sre> ...) <inner-cset-sre> ::= | <cset-sre> | <char> ; literal char | "<char>" ; string of one char | <char-set> ; embedded SRFI 14 char set | any | nonl | ascii | lower-case | lower | upper-case | upper | alphabetic | alpha | numeric | num | alphanumeric | alphanum | alnum | punctuation | punct | symbol | graphic | graph | whitespace | white | space | printing | print | control | cntrl | hex-digit | xdigit <range-spec> ::= <string> | <char>