Re: Overuse of strings

Re: Overuse of strings Per Bothner 25 Jan 2006 01:57 UTC
[I apologize - this message is somewhat off topic.]

Lauri Alanko wrote:
> On Tue, Jan 24, 2006 at 11:51:34AM -0800, Per Bothner wrote:
>
>>What would using symbols and s-exp gain?  What kind of
>>operations would it make easier?
>
>
> There are two different issues here: how should paths or URIs be
> represented at run-time, and what kind of notation should be used for
> giving literal values for them in code. As you are speaking about
> "operations", I assume you mean the former here.
>
> To me it is obvious: _all_ common operations on URIs are easier if you
> have a structured representation instead of a flat string. Maybe the
> most common operation is resolving a relative URI against a base URI. A
> purely string-based implementation is a huge mess that involves
> searching for slashes from right to left (but remembering that
> consequent slashes count as a single one),

Actually, two slashes define the "authority" part.

> detecting ".." and "."
> -segments and whatnot... it's the sort of thing you expect to see only
> in C code.

It's not *that* complicated.  And note that the specification is in
terms of string operations, so making sure that a "structued"
implementation gives the correct results may actually be more
difficult.

> Any sane implementation will first parse the URI into its constituents
> and form a list of path segments, and then operate on that list. It
> would be just silly to constantly parse and unparse the URIs at every
> operation, so it's better to have a distinct internal representation for
> them. And indeed, this is why many languages do have special types or
> classes for representing URIs.

I don't disagree.  Though "parsing and unparsing for every operation"
is unlikely to be performance critical. More, it may actually be faster
on modern computers, because it is more compact, and locality is great.
(Remember that to a first approximation on modern computers
instructions take no time - it is cache misses that are expensive.)

>>What about "path names" (as used in file operations): Should they be
>>structured objects or strings?
>
>
> Definitely objects. Nowadays PLT Scheme has built-in support for path
> objects, but before that I used to use a simple library:
> ...
> Here relative-path calculates the relative path from "from" to "to".
> Would you like to do this kind of stuff using _strings_?

No - I want this to be hidden in my implementation, using appropriate
library procedures.

My actual preference is an abstract opaque "path" type with operations
that can map to and from URI strings.  So whether the internal
representations uses URI strings or lists should be an implementation
issue.

> I just find it sad that underneath all these high-level conveniences,
> the operating system still uses strings for paths in the system call
> interface. As a result, '/' is an utterly magical character that cannot
> appear in any file's name.

I agree.  Though I'm not sure how one would fix that, given that one
does want a displayable and printable external representation.  The
RFC solution allows you to escape special characters, which means
you've changed reserved '/' for reserved '%'.

>>There are good reasons to prefer strings (standard, universal, and
>>familiar, as listed above). At least it makes sense to read and print
>>pathnames using URI syntax.
>
> Certainly it should be possible, but hardly the default.

Ignoring path-name literals (which I think are less frequent), you
still have to get pathnames from the user or the system.  S-expressions
as external syntax would still have to be validated, plus I don't
think it would be the choice for user interfaces.

> XML's surface
> syntax is also standard, universal and familiar. Would you suggest that
> XML data in Scheme code be therefore expressed with strings:
> "<foo>bar<baz/></foo>" instead of, say, Xexprs: (foo "bar" (baz))?

The latter, with one caveat: In Kawa, XML data are represented with
special types, and I think this is needed to best match the XML data
model.  (Namespaces are one factor.)  What happens in Kawa is that:
   (foo "bar" (baz))
*evaluates* to XML data, but it isn't XML data in itself.
(It depends what you're trying to do whether this distinction is
worthwhile, of course.)
--
	--Per Bothner
xxxxxx@bothner.com   http://per.bothner.com/