I implemented flonum-specific arithmetic primitives

(FFI dispatch, not new opcodes) assuming a fixed

number of args and ran some quick benchmarks on

(fibfp 37) [1]. The results (mean of 5 runs discarding

the fastest and slowest):

(scheme base): 17487ms

(scheme base) w/ immediate flonums: 8114ms (-54%)

(srfi 144) w/ immediate flonums: 8036ms (-55%)

Without immediate flonums the time is dominated

by GC, and it was impossible to measure a clear

improvement. With immediate flonums the

specialization does seem to be consistently faster,

though only by about 1%.

So for the sake of current chibi, a fixed number of

args doesn't seem to make enough of a difference.

For other (notably more optimizing) implementations

this may still be worthwhile though.

Alex