| Date: Sat, 22 Oct 2005 20:39:01 -0500
| From: Alan Watson <xxxxxx@astrosmo.unam.mx>
|
| Aubrey Jaffer wrote:
| > | > Flonums often are the most difficult feature to port to new
| > | > architectures.
| > |
| > | Why do you say that?
| >
| >From the experience of porting SCM to dozens of C compilers.
|
| Okay, when you said "architecture" I thought you refered to CPU
| instructions and data formats. Yes, compilers are a pain and there
| are frequent bugs in the standard library.
Sorry for the confusion; platform would have been a better word.
| > | That is, I would mandate only unlimited size integers in the
| > | core. The rest of the tower should be moved to the library.
| ...
| I would distinguish "moving the rest of the tower to a library in
| the *language* *definition*" and "moving the rest of the tower to a
| library in an *implementation*".
|
| That is, moving all but exact integers out of the core of the
| language definition simplifies the language definition and keeps it
| grounded in things that are generally agreed to be correct.
I agree; exact integers are certainly the best basis for formal
specification.
| However, that does not prohibit implementors from including
| important aspects of other numbers in the core of their
| implementation. For example, the core of their implementation could
| have representation and garbage collection for flonums, ratnums,
| and whatever else. You would probably end up duplicating some
| arithmetic routines, but that's about it.
|
| > [In SCM] The arithmetic subrs test first for INUMs, then bignums,
| > then flonums. The type dispatch for bignums and flonums is very
| > similar. It would be good to find what causes the difference.
|
| This is my point. The branches for inums and bignums are probably
| predicted as taken. Thus, when you use these generic operators on
| flonums, you incur two mispredicted branches. flonum-specific
| operators would save those.
Point taken.
| > I tested SCM and SCMLIT (fixnums only), both compiled with gcc
| > -O3, computing 2000 digits of pi 4 digits at a time on a Pentium
| > 4 3.00GHz. The benchmark uses only small integers.
| >
| > SCM took 5330.ms, while SCMLIT took 3330.ms, a substantial
| > savings.
|
| However, your results suggest to me that perhaps some of the
| branches are not predicted as they should be. It might be worth
| using the "__builtin_expect" feature of GCC to hint to the compiler
| that numbers are expected to be inums.
I tried adding __builtin_expect() to several of the type dispatch
macros. The one which sped up was used mainly for testing assertions.
It reduces the times to 4480.ms (SCM) and 2970.ms (integer-only).
Thanks!
| Of course, type-specific operators are "just" a performance hack to
| get around a lack of type analysis in many implementations. Hats
| off to stalin.
There are some other possible uses. Algebraic Petrofsky points out
that a real-only EXPT can do (expt -27 1/3) ==> -3, which R5RS EXPT
won't.