Rationale of uninterned symbols, and why not identifiers

Show/hide message thread
Rationale of uninterned symbols, and why not identifiers Daphne Preston-Kendal (26 Jan 2025 10:50 UTC)
Re: Rationale of uninterned symbols, and why not identifiers Marc Nieper-Wißkirchen (26 Jan 2025 13:25 UTC)
Re: Rationale of uninterned symbols, and why not identifiers Wolfgang Corcoran-Mathe (26 Jan 2025 18:45 UTC)
Rationale of uninterned symbols, and why not identifiers Daphne Preston-Kendal 26 Jan 2025 10:50 UTC
The rationale contains this sentence:
> [Uninterned symbols] can be used as unique keys shared between communicating macros or procedures, for example, since there is no possibility of collision between uninterned and user-created symbols.
which seems to be causing some confusion.

I think the words ‘or procedures’ should probably be removed, as this is really about communicating macros.

To wit:
The essence of hygienic macros is that identifiers’ symbolic names are only one part of whether they have ‘the same’ name. The other part is the set of marks/time stamps/colours the identifier carries. An identifier’s name is the same as another, in almost all contexts, if the symbolic name *and* the set of marks is the same.

The purpose of uninterned symbols is to allow communicating macros to use the binding system of Scheme to establish a communication channel which is effectively based on a set of marks only, and thus allow two different expansions to pass information between one another which will be visible only when the two expansions share the same original hygienic context. This works by using datum->syntax to create an identifier whose symbolic name is uninterned, but which has the set of marks which belongs to a certain macro use. Because the symbolic name is uninterned, there is still no risk of clobbering a variable name the macro user is actually intending to use. It has to be a symbol because only a symbol in a wrap is an identifier which can be used for a binding; no other type is.

Let’s say I have two macros, weed and bill-and-ben. bill-and-ben can work on its own, but if it appears inside the lexical context of a use of weed, it can change its behaviour according to some configuration given to it by weed.

So this is fine:
(bill-and-ben (flubalub))

And this is fine, and causes bill-and-ben to react to weed saying 'weeeeeed:
(weed 'weeeeeed
  (bill-and-ben (flubalub)))

But what about this?
(define-syntax slowcoach
  (syntax-rules ()
    ((_ body_0 body_1 ...) (bill-and-ben body_0 body_1 ...))))

Here bill-and-ben is an implementation detail of the macro slowcoach; for slowcoach to correctly hide its implementation details from the macro user, the following should act as if weed had not been used at all:

(weed 'weeeeeed
  (slowcoach (display "hello world\n")))

while a nested use of bill-and-ben within slowcoach *should* pick up the use of weed from the same expansion context:

(weed 'weeeeeed
  (slowcoach
    (bill-and-ben (flubalub))))

And when slowcoach2 comes along and does want to use the combination, the two uses of weed should not clobber one another:

(define-syntax slowcoach2
  (syntax-rules ()
    ((_ body_0 body_1 …)
     (weed 'good-morning
       (bill-and-ben body_0 body_1 ...)))))

(weed 'weeeeeed
  (slowcoach2
    (bill-and-ben (flubalub))))

Here slowcoach2’s internal use of bill-and-ben should pick up the fact that weed is saying 'good-morning, while the main one outside of the macro use should still think weed is saying 'weeeeeed.

The way to do all this is for the definitions of weed and bill-and-ben to know in common about an uninterned symbol. weed establishes a binding for that uninterned symbol with the set of marks corresponding to its own use and, e.g., attaches some identifier properties (SRFI 213) to that binding. bill-and-ben, assuming it is used with the same set of marks – which is the case in all these examples, for the described/desired ‘correct’ behaviour – can then recreate that identifier using its own set of marks and the uninterned symbol which it also knows about and look up the identifier property.

For R7RS large there are a few small advantages to having uninterned symbols:
- we can probably get rid of the ugly custom-ellipsis clause of syntax-case and go back to the more aesthetically-pleasing with-ellipsis (which can use this trick to allow safe communication of what the ellipsis in the current lexical context is)
- the generate-identifier procedure’s specification can change from ‘should’ to ‘must’ its guarantee that every invocation returns an identifier with a unique symbolic name if no explicit symbolic name is given

Even if this SRFI doesn’t make R7RS large, it is still valuable on its own terms. Being able to import (srfi :258) and have a standard interface to the uninterned symbols of multiple implementations is a worthwhile goal.

I hope to post a follow-up email some time in the week which will discuss potential alternatives to uninterned symbols for this same purpose.

Daphne