integrating PCREs - Simplelists

Show/hide message thread
integrating PCREs Michael Montague (26 Nov 2013 03:30 UTC)
Re: integrating PCREs Alex Shinn (26 Nov 2013 12:42 UTC)
Re: integrating PCREs Michael Montague (26 Nov 2013 17:53 UTC)
Re: integrating PCREs Alex Shinn (26 Nov 2013 21:51 UTC)
integrating PCREs Michael Montague 26 Nov 2013 03:30 UTC
I propose to integrate PCREs by having the same API work for both. The
grammer for SREs would be changed to require that they are lists. This
would remove the ambiguity: strings are PCREs and lists are SREs.

<sre> ::=
      | <cset-sre>                  ; A character set match.
      | <outer-sre>

<outer-sre> ::=
      | (* <inner-sre> ...)               ; 0 or more matches.
      | (+ <inner-sre> ...)               ; 1 or more matches.
      | (? <inner-sre> ...)               ; 0 or 1 matches.
      | (= <n> <inner-sre> ...)           ; <n> matches.
      | (>= <n> <inner-sre> ...)          ; <n> or more matches.
      | (** <n> <m> <inner-sre> ...)      ; <n> to <m> matches.

      | (|  <inner-sre> ...)              ; Alternation.
      | (or <inner-sre> ...)

      | (:   <inner-sre> ...)             ; Sequence.
      | (seq <inner-sre> ...)
      | ($ <inner-sre> ...)               ; Numbered submatch.
      | (submatch <inner-sre> ...)
      | (=> <name> <inner-sre> ...)               ; Named submatch.
<name> is
      | (submatch-named <name> <inner-sre> ...)   ;  a symbol.

      | (w/case   <inner-sre> ...)        ; Introduce a case-sensitive
context.
      | (w/nocase <inner-sre> ...)        ; Introduce a case-insensitive
context.

      | (w/unicode   <inner-sre> ...)     ; Introduce a unicode context.
      | (w/ascii <inner-sre> ...)         ; Introduce an ascii context.
      | (word <inner-sre> ...)            ; A sre wrapped in word
boundaries.
      | (word+ <inner-cset-sre> ...)      ; A single word restricted to
a cset.
      | word                        ; A single word.

      | (?? <inner-sre> ...)                ; A non-greedy pattern, 0 or
1 match.
      | (*? <inner-sre> ...)                ; Non-greedy 0 or more matches.
      | (**? m n <inner-sre> ...)           ; Non-greedy <m> to <n> matches.
      | (look-ahead <inner-sre> ...)        ; Zero-width look-ahead
assertion.
      | (look-behind <inner-sre> ...)       ; Zero-width look-behind
assertion.
      | (neg-look-ahead <inner-sre> ...)    ; Zero-width negative
look-ahead assertion.
      | (neg-look-behind <inner-sre> ...)   ; Zero-width negative
look-behind assertion.

<inner-sre> ::=
      | <outer-sre>
      | <inner-cset-sre>
      | <string>                    ; A literal string match.
      | bos                         ; Beginning of string.
      | eos                         ; End of string.

      | bol                         ; Beginning of line.
      | eol                         ; End of line.

      | bog                         ; Beginning of grapheme cluster.
      | eog                         ; End of grapheme cluster.
      | graheme                     ; A single grapheme cluster.

      | bow                         ; Beginning of word.
      | eow                         ; End of word.
      | nwb                         ; A non-word boundary.

<cset-sre> ::=
      | (<string>)                  ; literal char set
      | (/ <range-spec> ...)        ; ranges
      | (or <inner-cset-sre> ...)         ; union
      | (and <inner-cset-sre> ...)        ; intersection
      | (- <inner-cset-sre> ...)          ; difference
      | (~ <inner-cset-sre> ...)          ; complement of union
      | (w/case <inner-cset-sre> ...)     ; case and unicode toggling
      | (w/nocase <inner-cset-sre> ...)
      | (w/ascii <inner-cset-sre> ...)
      | (w/unicode <inner-cset-sre> ...)

<inner-cset-sre> ::=
      | <cset-sre>
      | <char>                      ; literal char
      | "<char>"                    ; string of one char
      | <char-set>                  ; embedded SRFI 14 char set
      | any | nonl | ascii | lower-case | lower
      | upper-case | upper | alphabetic | alpha
      | numeric | num | alphanumeric | alphanum | alnum
      | punctuation | punct | symbol | graphic | graph
      | whitespace | white | space | printing | print
      | control | cntrl | hex-digit | xdigit

<range-spec> ::= <string> | <char>