finishing SRFI-107 - representation of namespace declarations

Show/hide message thread

finishing SRFI-107 - representation of namespace declarations Per Bothner (27 Oct 2013 03:33 UTC)

Re: finishing SRFI-107 - representation of namespace declarations John Cowan (30 Oct 2013 02:22 UTC)

Re: finishing SRFI-107 - representation of namespace declarations Per Bothner (30 Oct 2013 04:01 UTC)

Re: finishing SRFI-107 - representation of namespace declarations John Cowan (30 Oct 2013 04:46 UTC)

finishing SRFI-107 - representation of namespace declarations Per Bothner 27 Oct 2013 03:12 UTC

SRFI-107 has been languishing for a while, but I'm going
to try to finish it up.  My latest edit is at
http://per.bothner.com/tmp/srfi-107/srfi-107.html
(NOTE: This isn't quite ready yet for review: I
think the syntax and translations (into core S-expressions)
are close to done, but the over-all structure needs more work.)

First, I'd like to nail down the translation of namespace declarations.
The current translation (as in the above URL, which is
different from the older version at srfi.schemers.org) is:

<prefix2:a xmlns:prefix1="URI1"
    xmlns:prefix2="URI&foo;2"
    xmlns="DURI">...</a> xmlns="DURI">...</prefix2:a>
==>
($xml-element$ ((prefix1 "URI1")
                 (prefix2 "URI" $entity$:foo "2")
                 (|| "DURI"))
                ($resolve-qname$ a prefix2) ...)

I.e.:

- The set of namespace declarations is translated to a list,
one element for each declaration.
- Each namespace declaration is a (sub-)list that starts with the prefix
being defined, and continues with the URI being bound to the prefix.
- Each prefix is a symbol. A default (element) namespace declaration
is represented with an empty symbol, or equivalently the "reserved"
prefix $default-element-namespace$.
- Normally the the namespace URI is a literal string (and possibly
entity references), but the format allows for evaluated expressions,
using the same format as attributes.

Note that the prefix names in both $resolve-qname$ and the namespace-
declaration list are unquoted symbols.  These forms are conceptually
a kind of variable reference and variable declaration, respectively,
in a kind of lexical scoping.  Therefore, using either strings
or quoted symbols would be IMO wrong.

The implication is that both $xml-element$ and $resolve-qname$
cannot be bound to functions - they must be syntax,  John Cowan
has expressed a preference that it be possible that these be
functions.  I don't think that is feasible while also supporting
namespaces, unless you use a rather clumsy encoding.  Specifically,
evaluating the tag-expression and the child-expressions must be
done *after* the binding is created, so they're evaluated in the
"scope" of the namespace declaration.  This can be done by wrapping
he sub-expressions as lambda-expressions, and resolving prefixes
using a hash-table.  However, the use of lambda expressions makes
for an unacceptably verbose and ugly encoding.

It is easy for $xml-element$ to be a function if namespace
support is not needed.  But maybe we can tweak the encoding so
that $xml-element$ can be a function when namespaces aren't needed,
while still supporting namespaces (requiring $xml-element$ to be macro)?

The first problem is that the above encoding translates an
empty namespace-declaration list to an empty list - which is
not self-evaluating (at least in portable Scheme).  We can
fix this by using a vector instead of a list:

<a xmlns:prefix1="URI1">...</>
==>
($xml-element$ #((prefix1 "URI1")) ...)

The advantage is that vectors are self-evaluating, and specifically
an empty vector is.  A modest disadvantage is that empty vectors
aren't necessarily shared.

The other problem is that even a prefix-less element tag a is
translated to ($resolve-qname$ a).  This is needed in case there
is default namespace declaration.  A non-namespace-supporting
implementation can easily define a macro that translates
($resolve-qname$ a) to (quote a).  If it is important that
$resolve-qname$ be implementable as a function, we can change
the reader-mapping to ($resolve-qname$ 'a), but it makes for a
more inefficient mapping.  The original namespace-using
example would be:

<prefix2:a xmlns:prefix1="URI1"
    xmlns:prefix2="URI&foo;2"
    xmlns="DURI">...</a> xmlns="DURI">...</prefix2:a>
==>
($xml-element$ #((prefix1 "URI1")
                 (prefix2 "URI" $entity$:foo "2")
                 (|| "DURI"))
                ($resolve-qname$ (quote a) prefix2) ...)

That's not terrible, but it is clumsier and less efficient
than the original mapping.  Personally, I don't see the value of
supporting a function-only implementation.  What is the use case?
Therefore, my preference is the mapping at the start of this
message, but if there is a strong expressed preference for the
mapping just above, that is acceptable too.
--
	--Per Bothner
xxxxxx@bothner.com   http://per.bothner.com/