|
Semantics of char-set-cursor-next
Bradley Lucier
(10 Jul 2023 02:20 UTC)
|
|
Re: Semantics of char-set-cursor-next
Marc Feeley
(10 Jul 2023 07:21 UTC)
|
|
Re: Semantics of char-set-cursor-next
Marc Nieper-Wißkirchen
(10 Jul 2023 09:40 UTC)
|
|
Re: Semantics of char-set-cursor-next
Marc Nieper-Wißkirchen
(10 Jul 2023 09:41 UTC)
|
|
Re: Semantics of char-set-cursor-next Bradley Lucier (10 Jul 2023 19:34 UTC)
|
I also found that the implementation here:
https://github.com/arcfide/chez-srfi/tree/master/%253a14
uses the reference implementation for subsets of Latin-1, and a
functional char-set-cursor-next with inversion lists for sets with
characters outside Latin-1.
I decided to do some timings to count the number of Unicode elements in
char-set:lower-case (a relatively small set) and the complement of
char-set:lower-case (a relatively large set). BTW, I got 2544 and
1109520 for answers, which at least add up to the right number:
https://stackoverflow.com/questions/27415935/does-unicode-have-a-defined-maximum-number-of-code-points
For both char-set:lower-case and its complement I compared (cs is a
char-set):
1. Using built-in fold on (char->list cs) (precalculated). The list
was then garbage collected.
2. Using char-set-fold on cs directly to do the same.
3. Using cursors with functional char-set-cursor-next to do the same.
4. Using cursors with non-functional char-set-cursor-next! to do the same.
The bottom line seems to be that
(a) fold on (char-set->list cs) (precalculated) seems a bit slower than
using char-set-fold on cs.
(b) functional cursors take about 3 times as long as char-set-fold.
(c) non-functional cursors (char-set-cursur-next! in the code) take
about twice as long as char-set-fold.
So functional char-set-cursor-next does fine; maybe I should add a
nonstandard char-set-cursor-next! for people who want it.
Raw data after the signature. number-of-tests was 100 and the CPU was
model name : Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz
in case anyone wants to compare implementations.
Thanks for you help.
Brad
> (load "cursor-time")
;;; Results for char-set:lower-case
(time (do-times number-of-tests (fold (lambda (c knil) (fx+ knil 1)) 0
lower-case-list)))
0.003092 secs real time
0.003081 secs cpu time (0.003081 user, 0.000000 system)
no collections
64 bytes allocated
no minor faults
no major faults
11094660 cpu cycles
(time (do-times number-of-tests (char-set-fold (lambda (c knil) (fx+
knil 1)) 0 char-set:lower-case)))
0.002917 secs real time
0.002918 secs cpu time (0.002912 user, 0.000006 system)
no collections
4864 bytes allocated
1 minor fault
no major faults
10470215 cpu cycles
(time (do-times number-of-tests (do ((cursor (char-set-cursor
char-set:lower-case) (char-set-cursor-next char-set:lower-case cursor))
(sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum))))
0.008956 secs real time
0.008940 secs cpu time (0.005587 user, 0.003353 system)
no collections
16297664 bytes allocated
1990 minor faults
no major faults
32147211 cpu cycles
(time (do-times number-of-tests (do ((cursor (char-set-cursor
char-set:lower-case) (char-set-cursor-next! char-set:lower-case cursor))
(sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum))))
0.005780 secs real time
0.005780 secs cpu time (0.005774 user, 0.000006 system)
no collections
16064 bytes allocated
2 minor faults
no major faults
20749272 cpu cycles
;;; Results for (char-set-complement char-set:lower-case)
(time (do-times number-of-tests (fold (lambda (c knil) (fx+ knil 1)) 0
not-lower-case-list)))
1.370073 secs real time
1.368231 secs cpu time (1.368231 user, 0.000000 system)
no collections
64 bytes allocated
no minor faults
no major faults
4920814899 cpu cycles
(time (do-times number-of-tests (char-set-fold (lambda (c knil) (fx+
knil 1)) 0 cs-not-lower-case)))
1.106573 secs real time
1.105510 secs cpu time (1.105510 user, 0.000000 system)
no collections
4864 bytes allocated
no minor faults
no major faults
3974420259 cpu cycles
(time (do-times number-of-tests (do ((cursor (char-set-cursor
cs-not-lower-case) (char-set-cursor-next cs-not-lower-case cursor)) (sum
0 (fx+ sum 1))) ((end-of-char-set? cursor) sum))))
4.420373 secs real time
4.414719 secs cpu time (4.414709 user, 0.000010 system)
1757 collections accounting for 1.866148 secs real time (1.863200
user, 0.000005 system)
7100944272 bytes allocated
no minor faults
no major faults
15876488694 cpu cycles
(time (do-times number-of-tests (do ((cursor (char-set-cursor
cs-not-lower-case) (char-set-cursor-next! cs-not-lower-case cursor))
(sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum))))
2.343621 secs real time
2.342618 secs cpu time (2.342615 user, 0.000003 system)
no collections
16064 bytes allocated
no minor faults
no major faults
8417478495 cpu cycles
"/home/lucier/lang/scheme/srfi-14/cursor-time.o16"