Semantics of char-set-cursor-next
Bradley Lucier
(10 Jul 2023 02:20 UTC)
|
Re: Semantics of char-set-cursor-next
Marc Feeley
(10 Jul 2023 07:21 UTC)
|
Re: Semantics of char-set-cursor-next
Marc Nieper-Wißkirchen
(10 Jul 2023 09:40 UTC)
|
Re: Semantics of char-set-cursor-next
Marc Nieper-Wißkirchen
(10 Jul 2023 09:41 UTC)
|
Re: Semantics of char-set-cursor-next Bradley Lucier (10 Jul 2023 19:34 UTC)
|
I also found that the implementation here: https://github.com/arcfide/chez-srfi/tree/master/%253a14 uses the reference implementation for subsets of Latin-1, and a functional char-set-cursor-next with inversion lists for sets with characters outside Latin-1. I decided to do some timings to count the number of Unicode elements in char-set:lower-case (a relatively small set) and the complement of char-set:lower-case (a relatively large set). BTW, I got 2544 and 1109520 for answers, which at least add up to the right number: https://stackoverflow.com/questions/27415935/does-unicode-have-a-defined-maximum-number-of-code-points For both char-set:lower-case and its complement I compared (cs is a char-set): 1. Using built-in fold on (char->list cs) (precalculated). The list was then garbage collected. 2. Using char-set-fold on cs directly to do the same. 3. Using cursors with functional char-set-cursor-next to do the same. 4. Using cursors with non-functional char-set-cursor-next! to do the same. The bottom line seems to be that (a) fold on (char-set->list cs) (precalculated) seems a bit slower than using char-set-fold on cs. (b) functional cursors take about 3 times as long as char-set-fold. (c) non-functional cursors (char-set-cursur-next! in the code) take about twice as long as char-set-fold. So functional char-set-cursor-next does fine; maybe I should add a nonstandard char-set-cursor-next! for people who want it. Raw data after the signature. number-of-tests was 100 and the CPU was model name : Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz in case anyone wants to compare implementations. Thanks for you help. Brad > (load "cursor-time") ;;; Results for char-set:lower-case (time (do-times number-of-tests (fold (lambda (c knil) (fx+ knil 1)) 0 lower-case-list))) 0.003092 secs real time 0.003081 secs cpu time (0.003081 user, 0.000000 system) no collections 64 bytes allocated no minor faults no major faults 11094660 cpu cycles (time (do-times number-of-tests (char-set-fold (lambda (c knil) (fx+ knil 1)) 0 char-set:lower-case))) 0.002917 secs real time 0.002918 secs cpu time (0.002912 user, 0.000006 system) no collections 4864 bytes allocated 1 minor fault no major faults 10470215 cpu cycles (time (do-times number-of-tests (do ((cursor (char-set-cursor char-set:lower-case) (char-set-cursor-next char-set:lower-case cursor)) (sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum)))) 0.008956 secs real time 0.008940 secs cpu time (0.005587 user, 0.003353 system) no collections 16297664 bytes allocated 1990 minor faults no major faults 32147211 cpu cycles (time (do-times number-of-tests (do ((cursor (char-set-cursor char-set:lower-case) (char-set-cursor-next! char-set:lower-case cursor)) (sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum)))) 0.005780 secs real time 0.005780 secs cpu time (0.005774 user, 0.000006 system) no collections 16064 bytes allocated 2 minor faults no major faults 20749272 cpu cycles ;;; Results for (char-set-complement char-set:lower-case) (time (do-times number-of-tests (fold (lambda (c knil) (fx+ knil 1)) 0 not-lower-case-list))) 1.370073 secs real time 1.368231 secs cpu time (1.368231 user, 0.000000 system) no collections 64 bytes allocated no minor faults no major faults 4920814899 cpu cycles (time (do-times number-of-tests (char-set-fold (lambda (c knil) (fx+ knil 1)) 0 cs-not-lower-case))) 1.106573 secs real time 1.105510 secs cpu time (1.105510 user, 0.000000 system) no collections 4864 bytes allocated no minor faults no major faults 3974420259 cpu cycles (time (do-times number-of-tests (do ((cursor (char-set-cursor cs-not-lower-case) (char-set-cursor-next cs-not-lower-case cursor)) (sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum)))) 4.420373 secs real time 4.414719 secs cpu time (4.414709 user, 0.000010 system) 1757 collections accounting for 1.866148 secs real time (1.863200 user, 0.000005 system) 7100944272 bytes allocated no minor faults no major faults 15876488694 cpu cycles (time (do-times number-of-tests (do ((cursor (char-set-cursor cs-not-lower-case) (char-set-cursor-next! cs-not-lower-case cursor)) (sum 0 (fx+ sum 1))) ((end-of-char-set? cursor) sum)))) 2.343621 secs real time 2.342618 secs cpu time (2.342615 user, 0.000003 system) no collections 16064 bytes allocated no minor faults no major faults 8417478495 cpu cycles "/home/lucier/lang/scheme/srfi-14/cursor-time.o16"