Re: Rationale of uninterned symbols, and why not identifiers

Show/hide message thread
Rationale of uninterned symbols, and why not identifiers Daphne Preston-Kendal (26 Jan 2025 10:50 UTC)
Re: Rationale of uninterned symbols, and why not identifiers Marc Nieper-Wißkirchen (26 Jan 2025 13:25 UTC)
Re: Rationale of uninterned symbols, and why not identifiers Wolfgang Corcoran-Mathe (26 Jan 2025 18:45 UTC)
Re: Rationale of uninterned symbols, and why not identifiers Marc Nieper-WiÃkirchen 26 Jan 2025 13:24 UTC
Thank you for this write-up. As it turns out, uninterned symbols are
not necessary for this use case.

I posted the following code yesterday on the IRC channel:

(import (chezscheme))

(let-syntax ([%secret-symbol
               (let ([g (datum->syntax #'* (gensym))])
                 (lambda (stx)
                   #''g))])

  (define-syntax store
    (lambda (stx)
      (syntax-case stx ()
        [(k e)
         (with-syntax ([x (datum->syntax #'k %secret-symbol)])
           #'(define x e))])))

  (define-syntax retrieve
    (lambda (stx)
      (syntax-case stx ()
        [(k)
         (identifier? #'k)
         (with-syntax ([x (datum->syntax #'k %secret-symbol)])
           #'x)]))))

(let ()
  (store "Hello, World!\n")
  (display-string (retrieve)))

The macros store and retrieve are a pair of communicating macros, as
you described. The code relies on `(gensym)` returning an unguessable
symbol. There is no need for the symbol to be uninterned. The code
also shows that "unguessable" symbol names do not have to be embedded
in the program text.

One can easily define helper macros that make handling unguessable symbols easy:

(define-syntax define-gensym
  (lambda (stx)
    (syntax-case stx ()
      [(_ id)
       (identifier? #'id)
       #'(define-syntax id
           (let ((g (datum->syntax #'* (gensym))))
             (lambda (stx)
               (syntax-case stx ()
                 [x (identifier? #'x) #`'#,g]))))])))

;; Use as follows:

(define-gensym g1)
(define-gensym g2)

(eq? g1 g1) ; => #t
(eq? g1 g2) ; => #f

If one really wants to have `with-ellipsis` in the language, it can
similarly implemented without uninterned symbols.

Uninterned symbols, lest special lexical syntax for them, therefore
still lack a convincing reason to include them in the language. With
unintended symbols, a particularly convincing reason is needed, in my
opinion, because their presence messes up the otherwise clear
semantics of symbols.

Note that an implementation does not have to generate a UUID or some
other unique string when the above code is run and expanded. The
procedure `(gensym)` does not have to create a name for the symbol but
leave it empty. Only when `symbol->string` is called does a name have
to be lazily generated and the symbol interned. In use cases such as
those discussed, `symbol->string` is never called.

Let me remark - outside of the discussion of SRFI 258 - that the
semantics of a pair of communicating macros like store/retrieve are
highly unhygienic because they lack referential transparency. It is
true that in some cases, you want to hide implementation details, such
as in `slowcoach`. However, slowcoach may also just be a convenience
macro to save typing; referential transparency says that an expression
should be replaceable by a macro use that generates the expression.
The problem can only be solved by making the communication explicit
through a named identifier. In the store/retrieve example, the obvious
identifier to choose is, obviously, retrieve:

(define-syntax store2
  (lambda (stx)
    (syntax-case stx ()
      [(k e)
       (with-implicit (k retrieve2)
         #'(begin
             (define x e)
             (define-syntax retrieve2
               (lambda (stx)
                 (syntax-case stx ()
                   [(_) #'x])))))])))

(let ()
  (store2 "Hi!\n")
  (display-string (retrieve2)))

(let-syntax ([foo (syntax-rules ()
                    [(_ e1 e2 ...)
                     (begin
                       (store2 "foo\n")
                       (display-string (retrieve2))
                       e1 e2 ...)])])
  (store2 "bar\n")
  (foo (display-string (retrieve2))))

(let ()
  (store2 "Hallo, Welt!")
  (let-syntax ([println (syntax-rules ()
                          [(_)
                           (begin
                             (display-string (retrieve2))
                             (newline))])])
    (println)))

Here, `println` is such a convenience macro I mentioned above. Note
that everything works as expected when store2 is used. `Foo` and `bar`
are also shielded from one another. (In the application of
`with-ellipsis`, it would, therefore, bind both `syntax-case` and
`syntax`.

Cheers,

Marc

Am So., 26. Jan. 2025 um 11:50 Uhr schrieb Daphne Preston-Kendal
<xxxxxx@nonceword.org>:
>
> The rationale contains this sentence:
> > [Uninterned symbols] can be used as unique keys shared between communicating macros or procedures, for example, since there is no possibility of collision between uninterned and user-created symbols.
> which seems to be causing some confusion.
>
> I think the words ‘or procedures’ should probably be removed, as this is really about communicating macros.
>
> To wit:
> The essence of hygienic macros is that identifiers’ symbolic names are only one part of whether they have ‘the same’ name. The other part is the set of marks/time stamps/colours the identifier carries. An identifier’s name is the same as another, in almost all contexts, if the symbolic name *and* the set of marks is the same.
>
> The purpose of uninterned symbols is to allow communicating macros to use the binding system of Scheme to establish a communication channel which is effectively based on a set of marks only, and thus allow two different expansions to pass information between one another which will be visible only when the two expansions share the same original hygienic context. This works by using datum->syntax to create an identifier whose symbolic name is uninterned, but which has the set of marks which belongs to a certain macro use. Because the symbolic name is uninterned, there is still no risk of clobbering a variable name the macro user is actually intending to use. It has to be a symbol because only a symbol in a wrap is an identifier which can be used for a binding; no other type is.
>
> Let’s say I have two macros, weed and bill-and-ben. bill-and-ben can work on its own, but if it appears inside the lexical context of a use of weed, it can change its behaviour according to some configuration given to it by weed.
>
> So this is fine:
> (bill-and-ben (flubalub))
>
> And this is fine, and causes bill-and-ben to react to weed saying 'weeeeeed:
> (weed 'weeeeeed
>   (bill-and-ben (flubalub)))
>
>
> But what about this?
> (define-syntax slowcoach
>   (syntax-rules ()
>     ((_ body_0 body_1 ...) (bill-and-ben body_0 body_1 ...))))
>
> Here bill-and-ben is an implementation detail of the macro slowcoach; for slowcoach to correctly hide its implementation details from the macro user, the following should act as if weed had not been used at all:
>
> (weed 'weeeeeed
>   (slowcoach (display "hello world\n")))
>
> while a nested use of bill-and-ben within slowcoach *should* pick up the use of weed from the same expansion context:
>
> (weed 'weeeeeed
>   (slowcoach
>     (bill-and-ben (flubalub))))
>
> And when slowcoach2 comes along and does want to use the combination, the two uses of weed should not clobber one another:
>
> (define-syntax slowcoach2
>   (syntax-rules ()
>     ((_ body_0 body_1 …)
>      (weed 'good-morning
>        (bill-and-ben body_0 body_1 ...)))))
>
> (weed 'weeeeeed
>   (slowcoach2
>     (bill-and-ben (flubalub))))
>
> Here slowcoach2’s internal use of bill-and-ben should pick up the fact that weed is saying 'good-morning, while the main one outside of the macro use should still think weed is saying 'weeeeeed.
>
> The way to do all this is for the definitions of weed and bill-and-ben to know in common about an uninterned symbol. weed establishes a binding for that uninterned symbol with the set of marks corresponding to its own use and, e.g., attaches some identifier properties (SRFI 213) to that binding. bill-and-ben, assuming it is used with the same set of marks – which is the case in all these examples, for the described/desired ‘correct’ behaviour – can then recreate that identifier using its own set of marks and the uninterned symbol which it also knows about and look up the identifier property.
>
> For R7RS large there are a few small advantages to having uninterned symbols:
> - we can probably get rid of the ugly custom-ellipsis clause of syntax-case and go back to the more aesthetically-pleasing with-ellipsis (which can use this trick to allow safe communication of what the ellipsis in the current lexical context is)
> - the generate-identifier procedure’s specification can change from ‘should’ to ‘must’ its guarantee that every invocation returns an identifier with a unique symbolic name if no explicit symbolic name is given
>
> Even if this SRFI doesn’t make R7RS large, it is still valuable on its own terms. Being able to import (srfi :258) and have a standard interface to the uninterned symbols of multiple implementations is a worthwhile goal.
>
> I hope to post a follow-up email some time in the week which will discuss potential alternatives to uninterned symbols for this same purpose.
>
>
> Daphne
>