Dear Will,

thank you for your most recent post; I find it very helpful.  Also for the possible benefit of other readers, let me rephrase what you call the "double rounding problem", which doesn't need large numbers to be explained:

Let M and N be two sets of (real) numbers where the result of the rounding of a real number x should lie.  E.g., M and N can be the sets of floating-point numbers with mantissa widths m and n.  Let us assume that M is a subset of N.  If x = y - eps where y in N and eps > 0 sufficiently small, rounding of x to N will yield y.  Assuming that y is not in M but y +/- d are, a rounding algorithm from N to M will have to decide with no further information whether it maps y to y + d or y - d, both being equidistant to y.  Now, a direct rounding of x to M is y - d.  The two-step rounding, first to N and then to M would achieve this if the rounding algorithm from N to M maps y to y - d.  This is not universal, however.  If it were that x = y + eps, it would have to map y to y + d instead.

The simplest example is probably given when M and N are the sets of floating-point numbers with mantissa widths 1 and 2.  Let x = 1011 in binary notation.  Rounding to N gives y = 1100.  We have d = 0100.  So, rounding to M is either 10000 (round to even or round to infinity) or 1000 (round to zero).  Now a direct rounding of x = 1011 to M gives 1000.  Finally, play the same game with x = 1101.  

While the "double-rounding" problem is, according to the specialist, the main reason making converting decimal scientific notation to binary floating-point non-trivial, it is noteworthy that it is independent of any decimal (or other) printing format.

One might want to explain the "double-rounding" problem by the discontinuity of the rounding function.  In fact, when the numbers in M and N are sufficiently evenly spaced, remembering a sign, i.e. writing y +/- 0 where +/- is a signed zero instead of y is enough to make the double rounding equivalent to the single rounding, which I make more precise below.  Incidentally, one can see this as a hint that the IEEE floating point format is unbalanced in a certain sense; as branch cuts, for example, can appear everywhere in the complex plane, not only 0 but all floating-point numbers should remember whether they are constructed by a limit from below or from above.

Now let's make the previous remark more precise with the following proposition concerned with binary floating-point approximations.

Proposition: Let x be a real number.  Let m < n be natural numbers.  Let y be a rounding of x to n significant bits.  Then there exists a rounding z of y to m significant bits such that z is also a rounding of x to m significant bits.

Proof: Let M and N be the set of numbers with m and n, respectively, significant bits.  There exists t0 and t1 in M such that M does not intersect the open interval ]t0, t1[ and such that x in [t0, t1].  We distinguish two cases:  In the first case, N does not intersect ]t0, t1[.  In this case, any rounding of x to n significant bits is a rounding of x to m significant bits and vice versa (and equal to t0 or t1).  In the second case, N does intersect ]t0, t1[.  In this case, t = (t0 + t1)/2 is in N.  If x in [t0, t], we therefore have y in [t0, t].  From that it follows that t0, which is a rounding of x to m significant bits, is a rounding of y to m significant bits.  If x in [t, t1], we argue similarly.

After these preliminary remarks, let me finally come to the two claims in Will's previous email.

Claim 1 is about how SRFI 77/R6RS is to be read.  Unless I misinterpret anything in Will's text, Claim 1 is more or less literally what's in the two documents, is consistent with Mike's paper from where the x|p idea came from, and is also how Mike as the author of SRFI 77 who was responsible for the x|p notation reads the text.  I, therefore, stand by Claim 1.

I would also like to add that the term "double rounding" might be slightly misleading in this context.  In the notation 1.1|54, the first rounding from the mathematical number 1.1 to the actual mathematical number represented by 1.1|54 is of a slightly different semantic quality than the the rounding of mathematical number 1.1|54 to an IEEE double.  The precise decimal number 1.1 plays no role in the lexical notation as long as for any other decimal number x the mathematical number x|54 is the same as 1.1|54.

Claim 2, different from Claim 1, is ultimately not about how to read SRFI 77/R6RS.  Whether Claim 2 holds or not does not touch the original topic of this thread.  The original context of Claim 2 was a discussion on ways to implement the semantics of Claim 1 efficiently.  If Claim 2 does not have to be retracted, Will's method cited in Mike's paper seems to be an efficient way to implement the semantics of Claim 1.  If Claim 2 has to be retracted, reading numbers with an explicitly given mantissa width may not be as efficient as reading numbers without, but this wouldn't make explicit mantissa widths unviable because (1) some important implementations like Chez Scheme have always used bigint arithmetic so there would be no difference in efficiency and (2) the vast majority of number literals in practice do not have an explicitly given mantissa width and can be read as efficiently as before.

I made Claim 2 in response to an earlier post by Will in the context that I didn't understand his argument that arbitrary precision would be needed to process the x|p notation.  If Claim 2 has to be retracted, it will mean that I have understood his argument.  (Given your helpful examples, I think I already have understood your line of thinking, Will.)

That said, I don't think that Claim 2 in the way I intended it has to be retracted, though.  The reason is the Proposition I formulated above.  The point is that R6RS does not prescribe how ties are to be resolved when a number has to be approximated by a number object with limited precision (i.e., no rounding mode is specified).  In other words, a binary approximation of x|53 in the IEEE floating point format is a suitable binary approximation of x|p for any p > 53.  The only thing we lose in an implementation that makes use of the statement of Claim 2 is that the expression (inexact #ex|p) does not evaluate to the same number object as x|p, but I can't find a statement that it must.  (That said, if I were an implementer, I would choose the slower method using bigints so that the invariant still holds.)

I can illustrate this point with the example 99999999999999983222783|54.  By Claim 1, this number literal describes the mathematical number 99999999999999983222784, which now has to be represented by IEEE double.  The two different methods ("double rounding" versus "single rounding") give, as shown in Will's post, the approximations 99999999999999991611392 and 99999999999999974834176.  The point is now that both approximations differ from the mathematical number 99999999999999983222784 by exactly the same absolute value, namely 8388608, so both approximations are equally good approximations and both methods give results allowed by R6RS.

Let me conclude by saying that we haven't touched on whether the x|p notation is particularly helpful or useful or not.  This is outside the scope of this thread.  The only thing we can conclude from this thread is that it does not touch the efficiency of the rest of the number system, in particular, the reading and writing of numbers without an explicitly given mantissa width.

Marc

Am So., 1. Sept. 2024 um 00:48 Uhr schrieb Will Clinger <xxxxxx@larcenists.org>:

For the benefit of anyone who might still be reading this August 2024
resurrection of the SRFI 77 discussion thread, let me summarize.

Marc Nieper-Wißkirchen has made several claims, and it looks to me
as though two of his claims contradict each other.

On 27 August 2024, in his remarks that resurrected this thread, Marc
made a claim to the effect that (in my words):

Claim 1. The R6RS requires the x|p notation to be read as (1) the best
possible binary floating-point approximation to x that uses p bits of
significand (which is mathematically well-defined), and then, in systems
that use binary floating-point representations for inexact reals, (2) that
best possible p-bit binary floating-point approximation is converted
to the best possible floating-point approximation that uses p bits "if
practical, or by the largest available precision if p or more bits of
significand are not practical within the implementation."

I believe there is general agreement that the "if...practical" loopholes
are there to accommodate the fact that implementations usually
support only a small finite number of floating-point precisions for
inexact reals. To simplify the following discussion, I will assume the
implementation supports IEEE double precision (53 bits of significand)
as its only representation for inexact reals.

Hence the external notation

99999999999999983222784|54

would first convert 99999999999999983222784 to the best possible
54-bit floating point approximation to 99999999999999983222784,
which is 99999999999999983222784.0, and would then convert that
number to its best possible approximation using IEEE double precision
arithmetic, which is 99999999999999991611392.0, which most
implementations of Scheme would print as 1.0e23.

For x = 99999999999999983222784, in an implementation that
represents all inexact reals using IEEE double precision floating point,
x|53 and x|54 would be read as the same inexact real.

Marc claimed that, in such an implementation, x|53 and x|54 would
always be read as the same inexact real, regardless of x:

Claim 2.
> ...in the case of inexact numbers, [...] the
> explicitly given mantissa width can be truncated at the maximum number
> of significant bits (which is 53 in the case of IEEE doubles).

But Marc's Claim 2 contradicts my interpretation of his Claim 1. Consider
the external notation

99999999999999983222783|54

The best 54-bit fp approximation to 99999999999999983222783 is
99999999999999983222784.0, and (as seen above), the best IEEE
double precision approximation to 99999999999999983222784.0 is
99999999999999991611392.0, which should print as 1.0e23.

Marc's Claim 2 says 99999999999999983222783|54 should be read as
the same inexact real as 99999999999999983222783|53. But the best
IEEE double precision approximation to 99999999999999983222783 is
99999999999999974834176.0, which most implementations of Scheme
would print as 9.999999999999997e22.

When representing inexact reals using IEEE double precision (53 bits),
9.999999999999997e22 is not the same inexact real as 1.0e23.

Marc must therefore retract one of his two claims (or explain how I have
been misinterpreting his claims). What I say next will depend upon which
claim Marc decides to retract.

(Spoiler alert: The numerical examples above illustrate the "double rounding"
problem, which is the main reason converting decimal scientific notation to
binary floating-point is non-trivial. Marc's Claim 1, as I interpret it, creates a
double rounding problem.)

Will Clinger