Semantics of char-set-cursor-next Bradley Lucier (10 Jul 2023 02:20 UTC)
Re: Semantics of char-set-cursor-next Marc Feeley (10 Jul 2023 07:21 UTC)
Re: Semantics of char-set-cursor-next Marc Nieper-Wißkirchen (10 Jul 2023 09:40 UTC)
Re: Semantics of char-set-cursor-next Marc Nieper-Wißkirchen (10 Jul 2023 09:41 UTC)
Re: Semantics of char-set-cursor-next Bradley Lucier (10 Jul 2023 19:34 UTC)

Re: Semantics of char-set-cursor-next Bradley Lucier 10 Jul 2023 19:34 UTC

I also found that the implementation here:

https://github.com/arcfide/chez-srfi/tree/master/%253a14

uses the reference implementation for subsets of Latin-1, and a
functional char-set-cursor-next with inversion lists for sets with
characters outside Latin-1.

I decided to do some timings to count the number of Unicode elements in
char-set:lower-case (a relatively small set) and the complement of
char-set:lower-case (a relatively large set).  BTW, I got 2544 and
1109520 for answers, which at least add up to the right number:

https://stackoverflow.com/questions/27415935/does-unicode-have-a-defined-maximum-number-of-code-points

For both char-set:lower-case and its complement I compared (cs is a
char-set):

1.  Using built-in fold on (char->list cs) (precalculated).  The list
was then garbage collected.

2.  Using char-set-fold on cs directly to do the same.

3.  Using cursors with functional char-set-cursor-next to do the same.

4.  Using cursors with non-functional char-set-cursor-next! to do the same.

The bottom line seems to be that

(a) fold on (char-set->list cs) (precalculated) seems a bit slower than
using char-set-fold on cs.

(b) functional cursors take about 3 times as long as char-set-fold.

(c) non-functional cursors (char-set-cursur-next! in the code) take
about twice as long as char-set-fold.

So functional char-set-cursor-next does fine; maybe I should add a
nonstandard char-set-cursor-next! for people who want it.

Raw data after the signature.  number-of-tests was 100 and the CPU was

model name	: Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz

in case anyone wants to compare implementations.

Thanks for you help.

Brad

 > (load "cursor-time")

;;; Results for char-set:lower-case
(time (do-times number-of-tests (fold (lambda (c knil) (fx+ knil 1)) 0
lower-case-list)))
     0.003092 secs real time
     0.003081 secs cpu time (0.003081 user, 0.000000 system)
     no collections
     64 bytes allocated
     no minor faults
     no major faults
     11094660 cpu cycles
(time (do-times number-of-tests (char-set-fold (lambda (c knil) (fx+
knil 1)) 0 char-set:lower-case)))
     0.002917 secs real time
     0.002918 secs cpu time (0.002912 user, 0.000006 system)
     no collections
     4864 bytes allocated
     1 minor fault
     no major faults
     10470215 cpu cycles
(time (do-times number-of-tests (do ((cursor (char-set-cursor
char-set:lower-case) (char-set-cursor-next char-set:lower-case cursor))
(sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum))))
     0.008956 secs real time
     0.008940 secs cpu time (0.005587 user, 0.003353 system)
     no collections
     16297664 bytes allocated
     1990 minor faults
     no major faults
     32147211 cpu cycles
(time (do-times number-of-tests (do ((cursor (char-set-cursor
char-set:lower-case) (char-set-cursor-next! char-set:lower-case cursor))
(sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum))))
     0.005780 secs real time
     0.005780 secs cpu time (0.005774 user, 0.000006 system)
     no collections
     16064 bytes allocated
     2 minor faults
     no major faults
     20749272 cpu cycles

;;; Results for (char-set-complement char-set:lower-case)
(time (do-times number-of-tests (fold (lambda (c knil) (fx+ knil 1)) 0
not-lower-case-list)))
     1.370073 secs real time
     1.368231 secs cpu time (1.368231 user, 0.000000 system)
     no collections
     64 bytes allocated
     no minor faults
     no major faults
     4920814899 cpu cycles
(time (do-times number-of-tests (char-set-fold (lambda (c knil) (fx+
knil 1)) 0 cs-not-lower-case)))
     1.106573 secs real time
     1.105510 secs cpu time (1.105510 user, 0.000000 system)
     no collections
     4864 bytes allocated
     no minor faults
     no major faults
     3974420259 cpu cycles
(time (do-times number-of-tests (do ((cursor (char-set-cursor
cs-not-lower-case) (char-set-cursor-next cs-not-lower-case cursor)) (sum
0 (fx+ sum 1))) ((end-of-char-set? cursor) sum))))
     4.420373 secs real time
     4.414719 secs cpu time (4.414709 user, 0.000010 system)
     1757 collections accounting for 1.866148 secs real time (1.863200
user, 0.000005 system)
     7100944272 bytes allocated
     no minor faults
     no major faults
     15876488694 cpu cycles
(time (do-times number-of-tests (do ((cursor (char-set-cursor
cs-not-lower-case) (char-set-cursor-next! cs-not-lower-case cursor))
(sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum))))
     2.343621 secs real time
     2.342618 secs cpu time (2.342615 user, 0.000003 system)
     no collections
     16064 bytes allocated
     no minor faults
     no major faults
     8417478495 cpu cycles
"/home/lucier/lang/scheme/srfi-14/cursor-time.o16"