The statement: "Switching to ASCII mode can improve performance in some implementations." made me wonder if the primary motivation for w/ascii was to improve performance.
On 10/17/2013 1:52 AM, Alex Shinn wrote:
On Thu, Oct 17, 2013 at 12:33 PM, Michael Montague <xxxxxx@gmail.com> wrote:
Why are w/ascii and w/unicode necessary? The ascii character set can be used instead.
(regexp-search `(: bos (* ,char-set:ascii) eos) "English") => #<rx-match>
(regexp-search `(: bos (* ,char-set:ascii) eos) "Ελληνική") => #f
You seem to be misunderstanding these operators. They applyto all contained patterns. The examples you are referring toare operating on the "letter" character class. You could, if youwanted, use intersection to restrict individual sets to ASCII-only:
(regexp-search `(: bos (* (& ascii letter)) eos) "English") => #<rx-match>
(regexp-search `(: bos (* (& ascii letter)) eos) "Ελληνική") => #f
(regexp-search `(: bos (* letter) eos) "Ελληνική") => #<rx-match>
However, this needs to be duplicated multiple times if thereare multiple nested csets, and is in fact impossible if the nestedcset is part of an external SRE, e.g. you can't do this here:
(import (only (mystuff regexp-common) rx:plurals))(regexp-search `(w/ascii ,rx:plurals) "...")
--Alex