| Date: Thu, 19 May 2005 07:45:15 -0700 (PDT)
| From: Noel Welsh <xxxxxx@yahoo.com>
|
| Aubrey wrote:
|
| > Reducing the precision of scalar operands nets no speed
| > increase from floating-point hardware.
|
| This is not the case in general. The action in floating point
| calculations is currently in the vector units (SSE2, AltiVec) found
| in modern processors. Indeed the Pentium 4s scalar floating point
| unit is dismal, often making the vector unit the preferred path for
| even scalar floating point code! Vector units generally have fixed
| size registers (e.g. 128-bits) meaning you can either achieve a 4x
| speedup on single floats (32-bits) or 2x on doubles (64-bits).
| What I've read of the Cell processor suggests it may only have
| vectorised FP units.
The crucial word in my claim is *scalar*. The big speedups for vector
calculations come with pipelined and parallel execution. The size of
the operand will have a minor effect on the latencies throttling speed
when the vector units are data starved.
| Hence allowing the user to specify precision seems like a good
| move.
The homogeneous arrays of SRFI-63 do that.