I implemented flonum-specific arithmetic primitives
(FFI dispatch, not new opcodes) assuming a fixed
number of args and ran some quick benchmarks on
(fibfp 37) [1].  The results (mean of 5 runs discarding
the fastest and slowest):

  (scheme base): 17487ms
  (scheme base) w/ immediate flonums: 8114ms (-54%)
  (srfi 144) w/ immediate flonums: 8036ms (-55%)

Without immediate flonums the time is dominated
by GC, and it was impossible to measure a clear
improvement.  With immediate flonums the
specialization does seem to be consistently faster,
though only by about 1%.

So for the sake of current chibi, a fixed number of
args doesn't seem to make enough of a difference.
For other (notably more optimizing) implementations
this may still be worthwhile though.

-- 
Alex

[1] https://github.com/ecraven/r7rs-benchmarks/blob/master/src/fibfp.scm