Implementing SRFI 115 using the PCRE library Lassi Kortela (19 Jul 2020 08:00 UTC)
Re: Implementing SRFI 115 using the PCRE library Arthur A. Gleckler (30 Nov 2020 19:41 UTC)
Re: Implementing SRFI 115 using the PCRE library Alex Shinn (02 Dec 2020 00:58 UTC)

Implementing SRFI 115 using the PCRE library Lassi Kortela 19 Jul 2020 08:00 UTC

STklos is being actively developed again at
<https://github.com/egallesio/STklos>. It's using the well-known
`libpcre` C library for a custom regexp API and we're thinking about
whether and how to add SRFI 115.

If my reading of the SRFI is correct, the SRE pattern-matching language
is a subset of the PCRE language. If this is the case then any SRE can
be converted to a string for libpcre.

Alex's irregex library contains a `sre->string` procedure that looks
like it does this conversion, or something quite close to it. Could
someone confirm or deny that `sre->string` is PCRE-compatible? The code
is below.

<https://github.com/ashinn/irregex/blob/master/irregex-utils.scm#L84>

(define (sre->string obj)
   (let ((out (open-output-string)))
     (let lp ((x obj))
       (cond
        ((pair? x)
         (case (car x)
           ((: seq)
            (cond
             ((and (pair? (cdr x)) (pair? (cddr x)) (not (eq? x obj)))
              (display "(?:" out) (for-each lp (cdr x)) (display ")" out))
             (else (for-each lp (cdr x)))))
           ((submatch)
            (display "(" out) (for-each lp (cdr x)) (display ")" out))
           ((submatch-named)
            (display "(?<" out) (display (cadr x) out) (display ">" out)
            (for-each lp (cddr x)) (display ")" out))
           ((or)
            (display "(?:" out)
            (lp (cadr x))
            (for-each (lambda (x) (display "|" out) (lp x)) (cddr x))
            (display ")" out))
           ((* + ? *? ??)
            (cond
             ((pair? (cddr x))
              (display "(?:" out) (for-each lp (cdr x)) (display ")" out))
             (else (lp (cadr x))))
            (display (car x) out))
           ((not)
            (cond
             ((and (pair? (cadr x)) (eq? 'cset (caadr x)))
              (display "[^" out)
              (display (cset->string (cdadr x)) out)
              (display "]" out))
             (else (error "can't represent general 'not' in strings" x))))
           ((cset)
            (display "[" out)
            (display (cset->string (cdr x)) out)
            (display "]" out))
           ((- & / ~)
            (cond
             ((or (eqv? #\~ (car x))
                  (and (eq? '- (car x)) (pair? (cdr x)) (eq? 'any (cadr
x))))
              (display "[^" out)
              (display (cset->string (if (eqv? #\~ (car x)) (cdr x)
(cddr x))) out)
              (display "]" out))
             (else
              (lp `(cset ,@(sre->cset x))))))
           ((w/case w/nocase)
            (display "(?" out)
            (if (eq? (car x) 'w/case) (display "-" out))
            (display ":" out)
            (for-each lp (cdr x))
            (display ")" out))
           (else
            (if (string? (car x))
                (lp `(cset ,@(string->list (car x))))
                (error "unknown sre operator" x)))))
        ((symbol? x)
         (case x
           ((bos bol) (display "^" out))
           ((eos eol) (display "$" out))
           ((any nonl) (display "." out))
           (else (error "unknown sre symbol" x))))
        ((string? x)
         (display (irregex-quote x) out))
        ((char? x)
         (display (irregex-quote (string x)) out))
        (else
         (error "unknown sre pattern" x))))
     (get-output-string out)))