Re: Floating-point formats and standards Bradd W. Szonye 06 Jan 2005 00:49 UTC

Bradd wrote (edited for clarity):
>> New name    Sig   Exp   Old name   Currently implemented by
>>
>> binary16     11     5
>> binary32     23     8   single     all systems (hardware)
>> binary64     52    11   double     all systems (hardware)
>> binaryx      64    15   extended   all x86-based systems (hardware)
>> binary128   112    15   quad       most RISC systems (software)
>>
>> instead of "binary80." Implementations are supposed to provide at
>> least one high- precision format for intermediate calculations ....

Aubrey Jaffer wrote:
> Can these intermediate formats be stored in memory?

All current general-purpose systems permit storage of binary32,
binary64, and at least one of (1) x86 binaryx, (2) binary128, or
(3) "double-double," an implementation-defined, 128-bit type.

They also support calculations using the same types. (The x86 family
uses binaryx internally for all calculations, but it's possible to
simulate binary32 and binary64 by rounding after each operation.)

In practice, most systems use only binary64 for storage and calculation.
Language support is by far strongest for the binary64 "double" type,
even in C (which nominally supports single, double, and extended
formats). Many programmers avoid "compressed" types like binary32, and
128-bit types commonly suffer from implementation defects.

> Can vectored instructions read and store the intermediate formats?

That's implementation-dependent. IIRC, neither IEEE 754 nor the draft
754R specify vector operations. Also, I seem to recall that x86 vector
ops use fixed-point (i.e., scaled integer) types rather than
floating-point types -- but I haven't been keeping track lately, so
don't take my word for it.

> While reading through 6.2 Numbers:
>
>   Machine representations such as fixed point and floating point are
>   referred to by names such as fixnum and flonum.
>
> So here is a possible naming based on that ...

Summary: The type encoding is A:hm-w, where

    h is the basic hardware numeric type
        fix = fixnum
        flo = flonum
    m is the abstract mathematic type
        c = complex
        r = real
        q = rational
        i = integer
        n = non-negative integer ("natural")
    w is the encoding width in bits

I think this encoding is a good "Schemey" way to describe hardware
types, even if it uses "integers" instead of "signeds."

> These abbreviations are pronounceable.

Yes, that's good.

> The fourth letter of the type name is C for complex, R for real, (Q
> for rational?,) I for integer, and N for nonnegative integer or
> natural number.

You may want to consider Z for integer. All of the other letters are the
traditional names for the mathematical sets!

    C is the set of complex numbers
    R is the set of real numbers
    Q is the set of rational numbers
    Z is the set of integers
    N is the set of natural numbers

Then again, I is more mnemonic for anybody without a math or CS degree.
Of course, all Scheme programmers have some CS training, by definition.

> The "-" between the type name and precision could be removed.

I temporarily deleted the hyphens in your table below, to try it out. I
think I like it best without hypens, but I don't feel strongly about it.

> Are fixnums and flonums necessarily binary?  Adding in a radix
> indicator would gum up the works.

Yes, you should probably worry about radix. While the IEEE 754R draft
probably won't be finalized & implemented for 5-10 years, many systems
(notably x86 and IBM big iron) already provide pack-decimal fixnums. It
needn't complicate the syntax, though:

    A:hm-w[x]

    x is an optional radix for the format
        b = binary, the default
        d = decimal

That echoes the Scheme syntax (#b, #d, etc) for radix in literals.

For example, the 754R "decimal128" type for high-precision financial
calculations would be A:flor-128d (or A:flor128d, without the hyphen).

Speaking of numeric literals, don't forget to specify that the f/single
exponent marker corresponds to A:floX-32b, and d/double corresponds to
A:floX-64b.

> prototype   exact-                                      prefix
> procedure   ness    element type                        (rank = n)
> =========   =====   ============                        ==========
> vector      any     #nA
> A:floc-64   inexact IEEE 64.bit binary flonum complex   #nA:floc-64
> A:floc-32   inexact IEEE 32.bit binary flonum complex   #nA:floc-32
> A:flor-64   inexact IEEE 64.bit binary flonum real      #nA:flor-64
> A:flor-32   inexact IEEE 32.bit binary flonum real      #nA:flor-32
> A:fixi-64   exact   64.bit binary fixnum                #nA:fixi-64
> A:fixi-32   exact   32.bit binary fixnum                #nA:fixi-32
> A:fixi-16   exact   16.bit binary fixnum                #nA:fixi-16
> A:fixi-8    exact   8.bit binary fixnum                 #nA:fixi-8
> A:fixn-64   exact   64.bit nonnegative binary fixnum    #nA:fixn-64
> A:fixn-32   exact   32.bit nonnegative binary fixnum    #nA:fixn-32
> A:fixn-16   exact   16.bit nonnegative binary fixnum    #nA:fixn-16
> A:fixn-8    exact   8.bit nonnegative binary fixnum     #nA:fixn-8
> A:boolean           boolean                             #nA:boolean

I like this, except for the part where you're still calling the IEEE 754
types by the wrong names!

You might want to at least note the existence of x86 binaryx (which uses
80 bits and has 96-bit alignment), binary128, and double-double.

I would prefer "bit" instead of Boolean, but I don't feel strongly about it.

What happened to character arrays?

> A two-by-three array of nonnegative 16.bit integers is written:
>
> #2A:fixn-16((0 1 2) (3 5 4))

I didn't convince you to switch from rank to shape, eh?

> Rank 0 arrays:
>
> #0a sym
> #0A:flor-32 237.0

You should probably also update the parameter lists for SRFI 47
procedures that take dimension bounds or indexes (e.g., make-array and
array-ref). The current version does not permit rank-0 arrays.

> The following equivalences will be defined to alias SRFI-47 names to
> the new ones. SRFI-47 should be amended or replaced to make these be
> the array-prototype-procedures:
>
> (define A:floc-64 ac64) ....

Perhaps you should fix the names here, and add a note to SRFI 47 that
they've been superseded? That would eliminate the need for another round
of reviews.
--
Bradd W. Szonye
http://www.szonye.com/bradd