Fwd: Suggested change to the sample implementation, & other comments

Show/hide message thread

Suggested change to the sample implementation, & other comments Göran Weinholt (21 Oct 2022 18:08 UTC)
(missing)
Fwd: Suggested change to the sample implementation, & other comments Marc Nieper-Wißkirchen (23 Oct 2022 11:34 UTC)
Re: Suggested change to the sample implementation, & other comments Marc Nieper-Wißkirchen (23 Oct 2022 11:36 UTC)
Re: Suggested change to the sample implementation, & other comments Lassi Kortela (21 Oct 2022 22:04 UTC)
Re: Suggested change to the sample implementation, & other comments Marc Nieper-Wißkirchen (22 Oct 2022 08:17 UTC)
Re: Suggested change to the sample implementation, & other comments Marc Nieper-Wißkirchen (22 Oct 2022 14:17 UTC)

Fwd: Suggested change to the sample implementation, & other comments Marc Nieper-WiÃkirchen 23 Oct 2022 11:34 UTC

Following a well-established tradition, I forgot to reply to the list as well.

---------- Forwarded message ---------
Von: Marc Nieper-Wißkirchen <xxxxxx@gmail.com>
Date: Fr., 21. Okt. 2022 um 20:27 Uhr
Subject: Re: Suggested change to the sample implementation, & other comments
To: G?ran Weinholt <xxxxxx@weinholt.se>

Hello Göran,

thanks for your comments.  (Strangely enough, although in my native
language there is also the letter "ö" my computer displays your email
address's name incorrectly.)

Am Fr., 21. Okt. 2022 um 20:09 Uhr schrieb Göran Weinholt <xxxxxx@weinholt.se>:
>
> Hello Marc,
>
> Thanks for another interesting SRFI. I noticed that the sample
> implementation uses a construction that neither Chez nor Loko optimizes:
>
>   (expand/optimize
>     '(define set-car+cdr!
>        (lambda (p x y)
>          (assert (pair? p))
>          (perform
>            (set-car! p x)
>            (set-cdr! p y)))))
>   =>
>   (begin
>     (set! set-car+cdr!
>       (lambda (p x y)
>         (let ([t (pair? p)])
>           (if t
>               (void)
>               (assertion-violation #f "failed assertion (pair? p)")))
>         (call-with-values
>           (lambda () (set-car! p x))
>           (lambda g2
>             (call-with-values
>               (lambda () (set-cdr! p y))
>               (lambda g3 (values)))))))
>     (void))

Thank you very much for making this test.  I would have thought that
Chez uses the ideas of Ashley/Dybvig's paper "Implementation of
Multiple Return Values in Scheme" to eliminate the `call-with-values`
calls.  (As mentioned in that paper, `call-with-values` it is not a
particularly good primitive because of the unnecessary closure
generation).

> This code allocates closures and is less efficient than just using
> begin. The sample implementation also has an unreachable syntax-case
> clause.

You are correct about the unreachable clause.  It is a left over from
some previous experiments with syntax; it will be removed.

> The problem can be fixed by not using let-values:
>
>   (define-syntax perform
>     (lambda (stx)
>       (syntax-case stx ()
>         [(_ expr ...)
>          (with-syntax ([(tmp ...) (generate-temporaries #'(expr ...))])
>            #'(let ([tmp (begin expr #f)] ...) (values)))])))
>
> The new code:
>
>   (expand/optimize
>     '(define set-car+cdr!
>        (lambda (p x y)
>          (assert (pair? p))
>          (perform
>            (set-car! p x)
>            (set-cdr! p y)))))
>   =>
>   (begin
>     (set! set-car+cdr!
>       (lambda (p x y)
>         (let ([t (pair? p)])
>           (if t
>               (void)
>               (assertion-violation #f "failed assertion (pair? p)")))
>         (set-cdr! p y)
>         (set-car! p x)
>         (values)))
>     (void))

Just as a note: this is similar to how program bodies are explained in R6RS.

> Now I'd like to address the usefulness of the perform construct with a
> real world example. The <https://gitlab.com/weinholt/struct-pack/>
> library has a pack form that expands to several bytevector-...-set!
> calls where the order doesn't matter. An example:
>
>   (expand/optimize
>     '(define (shm:put-image c drawable gc
>                             total-width total-height
>                             src-x src-y src-width src-height
>                             dst-x dst-y depth format send-event
>                             shmseg offset)
>        (send-request c #f extension-name XCB_SHM_PUT_IMAGE
>          (pack "LL6S2sCCCxLL" (xid-id drawable) (xid-id gc)
>                 total-width total-height src-x src-y
>                 src-width src-height
>                 dst-x dst-y depth format
>                 (if send-event 1 0) (xid-id shmseg)
>                 offset))))
>   =>
>   (begin
>     (set! shm:put-image
>       (lambda (c drawable gc total-width total-height src-x src-y src-width
>                src-height dst-x dst-y depth format send-event shmseg
>                offset)
>         (send-request c #f extension-name XCB_SHM_PUT_IMAGE
>           (let ([bv (make-bytevector 36)])
>             (bytevector-u32-native-set! bv 32 offset)
>             (bytevector-u32-native-set! bv 28 (xid-id shmseg))
>             (bytevector-u8-set! bv 27 0)
>             (bytevector-u8-set! bv 26 (if send-event 1 0))
>             (bytevector-u8-set! bv 25 format)
>             (bytevector-u8-set! bv 24 depth)
>             (bytevector-s16-native-set! bv 22 dst-y)
>             (bytevector-s16-native-set! bv 20 dst-x)
>             (bytevector-u16-native-set! bv 18 src-height)
>             (bytevector-u16-native-set! bv 16 src-width)
>             (bytevector-u16-native-set! bv 14 src-y)
>             (bytevector-u16-native-set! bv 12 src-x)
>             (bytevector-u16-native-set! bv 10 total-height)
>             (bytevector-u16-native-set! bv 8 total-width)
>             (bytevector-u32-native-set! bv 4 (xid-id gc))
>             (bytevector-u32-native-set! bv 0 (xid-id drawable))
>             bv))))
>     (void))
>
> As can be seen, the expanded code is an overspecification of the
> original pack form. If the ...-set! calls were wrapped in perform it
> would allow the compiler to perform several optimizations:
>
>  - It would be possible to do evaluation order determination on the
>    whole block. In this example it would allow the compiler to first
>    emit the code for writing the constants and the variables, and last
>    emit the calls to xid-id when the register pressure is lower.
>
>  - The compiler can also emit a single range check for the whole block,
>    but still be free to order the writes any way it likes.
>
>  - The compiler can emit a single type check for the bytevector. But I
>    don't think perform really helps the compiler's analysis here. I'm
>    thinking about the algorithm in [1] and I'm not sure it could be
>    usefully combined with perform.

Within a "perform" form, one can move type checks to the top,
generating more information for the rest.  For example, in a silly
example like

(lambda (x)
  (write x port1)
  (write (car x) port2))

the first write has to be a generic one; although the implementation
will raise an assertion violation at the second expression, it cannot
do so before writing the value of X.  On the other hand, given

(lambda (x)
  (perform
    (write x port1)
    (write (car x) port2)))

the compiler can hoist the second expression.  There will be one type
check at the top, and one of the writes is no longer generic but can
be specialized to a write for pairs.

> Of course the ability to just say "the order doesn't matter here" is
> useful in itself.
>
> BTW, I think this SRFI might be misinterpreted as allowing for parallel
> evaluation order. You could use language similar to this from R6RS:
>
>   Although the order of evaluation is otherwise unspecified, the effect
>   of any concurrent evaluation of the operator and operand expressions
>   is constrained to be consistent with some sequential order of
>   evaluation. The order of evaluation may be chosen differently for each
>   procedure call.

Good idea.  I'll incorporate it.

Kind regards,

Marc

>
> [1] Michael D. Adams, Andrew W. Keep, Jan Midtgaard, Matthew Might, Arun
>     Chauhan, and R. Kent Dybvig. Flow-sensitive type recovery in
>     linear-log time. In Proceedings of the 2011 ACM International
>     Conference on Object Oriented Programming Systems Languages and
>     Applications, OOPSLA ’11, pages 483–498. ACM, New York, NY, USA,
>     October 2011. ISBN 978-1-4503-0940-0. doi: 10.1145/2048066.2048105.
>
> Best regards,
>
> --
> Göran Weinholt   | https://weinholt.se/
> Debian Developer | 73 de SA6CJK