Re: Floating-point formats and standards Aubrey Jaffer 17 Jan 2005 05:13 UTC

 | Date: Fri, 7 Jan 2005 04:51:09 -0800
 | From: "Bradd W. Szonye" <xxxxxx@szonye.com>
 |
 | Brief summary: More praise, a few minor suggestions, and one major
 | issue.
 |
 | I thought of two more items for the syntax examples.
 |
 | 1. They should include examples of how to write arrays with empty
 |    dimensions. (I hope I got all these right.)
 |
 |    #A0*2()
 |    #A2*0(() ()).
 |    #A2*0*3(() ()).
 |    #A2*3*0((() () ()) (() () ())).

Yes; I added these.

 | 2. It should be an error to write any array with inconsistent or
 |    ambiguous rank or dimension specifiers.
 |
 |    #3A()           array shape is ambiguous
 |    #2A((1 2) (1))  column widths are inconsistent
 |    #3A1*1((1))     rank is inconsistent with dimension specifier

I have added a sentence to that effect.

 | >> ... but it also unifies the dimension syntax with the #n(...)
 | >> vector syntax of Common Lisp and PLT Scheme.
 |
 | > Can you describe the PLT compatibility?
 |
 | PLT Scheme implements the Common Lisp #n(...) syntax with two extensions.
 |
 | In Common Lisp, #n(v.1 v.2 ... v.k) creates a vector of size N, with the
 | final value, v.k, repeated (N - K) times. It's an error if (K > N) or if
 | (K = 0 and N > 0). Examples:
 |
 |     #5(1 2) ==> #(1 2 2 2 2)
 |     #2(1 2) ==> #(1 2)
 |     #0()    ==> #()
 |     #1(1 2) ==> error: too many values
 |     #5()    ==> error: no values for non-empty array
 |
 | PLT Scheme adds two extensions: If (K = 0), the vector is filled with 0,
 | and if (K > N) the reader raises a specific exception. If "it is an
 | error" means the same thing in Common Lisp as it does in Scheme, these
 | are pure extensions of the CL feature.

Thanks; I have added your descriptions.

 | With the "A" optional, the array literal syntax is consistent with
 | CL/PLT: #3(1 2 3) == #A3(1 2 3) == #1A3(1 2 3) == #1A(1 2 3).
 |
 | >> Don't forget to give semantics and examples for the shaped-array
 | >> syntax.  I don't particularly care whether you define the repeat-fill
 | >> rule; feel free to leave it for a later SRFI if you don't want to
 | >> write it up.

SRFI-58 will pass.

 | >> That covers all of the basic types.  Only binaryx [x86] is missing ....
 |
 | > I think you will find that vectorized instructions don't pack
 | > non-power-of-2 bit widths.  Breaking operands over cache-line
 | > boundaries is a huge hassle for hardware.
 | >
 | > So the 80-bit and 96-bit are likely stored into 128 bits.  We might as
 | > well call them 128b.
 |
 | Ah, OK.  That's good enough for now, since I know of no
 | implementations with both x86 and quad flonums.  I suppose it could
 | happen in the future, if Intel keeps binaryx and adds binary128,

If a CPU had both 96.bit and 128.bit formats, which would both occupy
128.bits in memory, why would one want to use the 96.bit format?

 | but that's probably 5 to 10 years off, if it happens at all.  If it
 | ever becomes a problem, it wouldn't be hard to add flox96b
 | specifiers.
 |
 | > This text I added describes how sizes are chosen:
 | >
 | >   Implementations are required to accept all of the type denotations.
 | >   Uniform types of matching sizes which the platform supports will be
 | >   used; the others will be represented as the next larger format of
 | >   the same type implemented.  If there is no larger format of the same
 | >   type and there is a bignum format for that element type, then the
 | >   array format defaults to vector; otherwise the largest uniform
 | >   format of that type is used.
 |
 | I'll try an example to make sure I understand the intent correctly.
 | Suppose that I'm targeting all the machines in an HP data center
 | (a mix of x86, Itanium, and PA-RISC).
 |
 |     flor128b => x86 extended (binaryx) on x86 and Itanium
 |     flor16b  => IEEE single (binary32) on all three systems
 |     fixz64b  => bignum on x86, or signed32 if there are no bignums
 |
 | I have one major issues with the rule as written above: It's not clear
 | what to do if the system lacks a flonum format altogehter.

Implementations which don't have inexacts can't do inexact
computations.  SRFI-58 doesn't try to remedy that.

 | For example, if you have no decimal flonums at all, you should
 | probably use rational bignums instead (because binary flonums are
 | unacceptable for the major decnum applications).  However, if
 | rational bignums are allowed as a "flonum" format, then the rule
 | above would prefer them over binaryx.

Bignums are not fixed size.  Uniform arrays are for fixed-size number
formats only.

 | The essential problem here is that flonum, decnum, and fixnum
 | typically have different requirements.  In general, precision and
 | range are non-negotiable for fixnums and decnums: Users only
 | specify the large types when the data requires it.  However, speed
 | is usually more important than precision for binary flonums.
 |
 | It might be best to specify each major element type separately.
 | Here are my recommendations for defaults when there's format large
 | enough.  Each offers a list of acceptable possibilities, roughly in
 | order from most to least desirable.  An implementation should
 | document its behavior (and should allow users to reconfigure that
 | behavior).
 |
 | Binary flonum (floTWb)
 | a. largest available binary flonum
 | b. arbitrary-precision floating-point number (a "floating-point bignum")
 | c. rational bignum
 | d. none (signal an error)
 |
 | Decimal flonum (floTWd)
 | a. arbitrary-precision floating-point number
 | b. rational bignum
 | c. binary flonum with at least 2W precision
 | d. none (signal an error)
 | e. largest available decimal flonum
 |
 | Fixnum (fixTWb)
 | a. flonum with at least W bits precision in the significand
 | b. bignum
 | c. none (signal an error)

SRFI-47 is about fixed-size number representations, not about type
declaration.  Here are the defaulting rules from SRFI-47's replacement
(submitted today).  An assumption of this SRFI is that no uniform
types are larger than the Scheme implementation supports as numbers.
The decimal rules probably need further development, which is
difficult without experience using decimal formats.

     Prototype Procedures

 Implementations are required to define all of the prototype procedures.
 Uniform types of matching format and sizes which the platform supports
 will be used; the others will be represented as follows:

 For inexact flonum complex arrays:

     * the next larger complex format is used;
     * if there is no larger format,
	   o then if the implementation supports complex floating-point
	     numbers of unbounded precision,
		 + then a heterogeneous array;
		 + else the largest inexact flonum complex array.

 For inexact flonum real arrays:

     * the next larger real format is used;
     * if there is no larger real format, then the next larger complex
       format is used.
     * If there is no larger complex format,
	   o then if the implementation supports floating-point real
	     numbers of unbounded precision,
		 + then a heterogeneous array;
		 + else the largest inexact flonum real or complex array.

 For exact decimal flonum arrays:

     * the next larger decimal flonum format array is used;
     * If there is no larger decimal flonum format, then a heterogeneous
       array is used.

 For exact bipolar fixnum arrays:

     * the next larger bipolar fixnum format array is used;
     * If there is no larger bipolar fixnum format,
	   o then if the implementation supports exact integers of
	     unbounded precision,
		 + then a heterogeneous array;
		 + else the largest bipolar fixnum array.

 For exact nonnegative fixnum arrays:

     * the next larger nonnegative fixnum format array is used;
     * If there is no larger nonnegative fixnum format,
	   o then the next larger bipolar fixnum format is used.
	   o If there is no larger bipolar fixnum format,
		 + then if the implementation supports exact integers of
		   unbounded precision,
		       # then a heterogeneous array;
		       # else the largest nonnegative or bipolar fixnum
			 array.

 This arrangement has platforms which support uniform array types
 employing them, with less capable platforms using vectors; but all
 working compatibly from the same source code.

 | >> No fixq arrays for rational numbers?  C'mon, you know you want to!
 |
 | > Although the syntax could easily specify arbitrary precisions, the
 | > prototype functions would need to take extra arguments.  Should the
 | > precision be specified as total bits and fractional bits; or integral
 | > bits and fractional bits?
 |
 | Oh, I figured they'd work like complex numbers, e.g., a fixq32b would
 | have a 32-bit numerator and a 32-bit denominator. (That is how complex
 | numbers work, right?)

I think rationals composed of fixed-width integers don't behave well
with regard to error bands and rounding.  They certainly waste a large
portion of the integer pairs.