Simplifying SRFI 109, part 1: entities John Cowan 10 Feb 2013 08:04 UTC

This is the first of two posts proposing simplifications (reductions in
scope) for SRFI 109.  The idea is that by removing variable elements,
this SRFI (unlike SRFIs 107 and 108) becomes purely lexical in scope:
the output of the a SRFI-109-capable reader returns the same thing for
a SRFI-109 string literal and a regular string, viz. an immutable Scheme
string object.

In this first post, I argue against the provision of user-defined
entity names.  Currently, when an entity reference appears in a SRFI
109 string literal, it is expanded into the identifier $entity:<name>$,
where <name> is the entity referred to.  Thus &{Rom&acirc;nia} expands
to ($string$ "Rom" $entity:acirc$ "nia").  In principle, this permits a
user to rebind $entity:acirc$ to something else.  However, there seems no
reason why this should be allowed; it is only productive of confusion.
Such entity references should just expand directly to the character, so
that &{Rom&acirc;nia} becomes ($string$ "România"), or just "România".

Nor is it likely that anyone will need character entities past the 2237
already provided by the standard W3C list.  It is already a requirement
that systems not add names that conflict with any of these.  True, you
cannot write (say) Hindi in the Devanagari script using character entity
references only.  But if you are going to do that, you will probably
want to use a UTF-8 compatible editor with appropriate fonts.

I therefore believe that character entities should be expanded directly
into characters by the implementation.  This eliminates one of the
use cases for requiringing SRFI 109 string literals to expand into calls
on $string$.  I would also strengthen, from a MAY to a SHOULD, the
recommendation to implement the whole standard list.

Man has no body distinct from his soul,              John Cowan
for that called body is a portion of the soul
discerned by the five senses,              
the chief inlets of the soul in this age.  --William Blake