I implemented flonum-specific arithmetic primitives
(FFI dispatch, not new opcodes) assuming a fixed
number of args and ran some quick benchmarks on
(fibfp 37) [1]. The results (mean of 5 runs discarding
the fastest and slowest):
(scheme base): 17487ms
(scheme base) w/ immediate flonums: 8114ms (-54%)
(srfi 144) w/ immediate flonums: 8036ms (-55%)
Without immediate flonums the time is dominated
by GC, and it was impossible to measure a clear
improvement. With immediate flonums the
specialization does seem to be consistently faster,
though only by about 1%.
So for the sake of current chibi, a fixed number of
args doesn't seem to make enough of a difference.
For other (notably more optimizing) implementations
this may still be worthwhile though.
--
Alex