Implementing SRFI 115 using the PCRE library
Lassi Kortela 19 Jul 2020 08:00 UTC
STklos is being actively developed again at
<https://github.com/egallesio/STklos>. It's using the well-known
`libpcre` C library for a custom regexp API and we're thinking about
whether and how to add SRFI 115.
If my reading of the SRFI is correct, the SRE pattern-matching language
is a subset of the PCRE language. If this is the case then any SRE can
be converted to a string for libpcre.
Alex's irregex library contains a `sre->string` procedure that looks
like it does this conversion, or something quite close to it. Could
someone confirm or deny that `sre->string` is PCRE-compatible? The code
is below.
<https://github.com/ashinn/irregex/blob/master/irregex-utils.scm#L84>
(define (sre->string obj)
(let ((out (open-output-string)))
(let lp ((x obj))
(cond
((pair? x)
(case (car x)
((: seq)
(cond
((and (pair? (cdr x)) (pair? (cddr x)) (not (eq? x obj)))
(display "(?:" out) (for-each lp (cdr x)) (display ")" out))
(else (for-each lp (cdr x)))))
((submatch)
(display "(" out) (for-each lp (cdr x)) (display ")" out))
((submatch-named)
(display "(?<" out) (display (cadr x) out) (display ">" out)
(for-each lp (cddr x)) (display ")" out))
((or)
(display "(?:" out)
(lp (cadr x))
(for-each (lambda (x) (display "|" out) (lp x)) (cddr x))
(display ")" out))
((* + ? *? ??)
(cond
((pair? (cddr x))
(display "(?:" out) (for-each lp (cdr x)) (display ")" out))
(else (lp (cadr x))))
(display (car x) out))
((not)
(cond
((and (pair? (cadr x)) (eq? 'cset (caadr x)))
(display "[^" out)
(display (cset->string (cdadr x)) out)
(display "]" out))
(else (error "can't represent general 'not' in strings" x))))
((cset)
(display "[" out)
(display (cset->string (cdr x)) out)
(display "]" out))
((- & / ~)
(cond
((or (eqv? #\~ (car x))
(and (eq? '- (car x)) (pair? (cdr x)) (eq? 'any (cadr
x))))
(display "[^" out)
(display (cset->string (if (eqv? #\~ (car x)) (cdr x)
(cddr x))) out)
(display "]" out))
(else
(lp `(cset ,@(sre->cset x))))))
((w/case w/nocase)
(display "(?" out)
(if (eq? (car x) 'w/case) (display "-" out))
(display ":" out)
(for-each lp (cdr x))
(display ")" out))
(else
(if (string? (car x))
(lp `(cset ,@(string->list (car x))))
(error "unknown sre operator" x)))))
((symbol? x)
(case x
((bos bol) (display "^" out))
((eos eol) (display "$" out))
((any nonl) (display "." out))
(else (error "unknown sre symbol" x))))
((string? x)
(display (irregex-quote x) out))
((char? x)
(display (irregex-quote (string x)) out))
(else
(error "unknown sre pattern" x))))
(get-output-string out)))