Re: Overuse of strings

Show/hide message thread

Re: Overuse of strings Lauri Alanko (24 Jan 2006 17:59 UTC)

Re: Overuse of strings Per Bothner (24 Jan 2006 19:51 UTC)

Re: Overuse of strings Alan Bawden (25 Jan 2006 00:44 UTC)

Re: Overuse of strings Alex Shinn (25 Jan 2006 01:39 UTC)

Re: Overuse of strings Per Bothner (25 Jan 2006 02:04 UTC)

Re: Overuse of strings Alan Bawden (25 Jan 2006 02:50 UTC)

Re: Overuse of strings Lauri Alanko (25 Jan 2006 18:19 UTC)

Re: Overuse of strings Neil Van Dyke (25 Jan 2006 19:07 UTC)

Re: Overuse of strings bear (25 Jan 2006 22:40 UTC)

Re: Overuse of strings Lauri Alanko (26 Jan 2006 07:35 UTC)

Re: Overuse of strings Alex Shinn (26 Jan 2006 01:37 UTC)

Re: Overuse of strings Neil Van Dyke (26 Jan 2006 02:03 UTC)

Re: Overuse of strings Anton van Straaten (26 Jan 2006 10:09 UTC)

Re: Overuse of strings Lauri Alanko (26 Jan 2006 10:25 UTC)

Re: Overuse of strings Alex Shinn (26 Jan 2006 02:17 UTC)

Re: Overuse of strings Ray Blaak (26 Jan 2006 06:56 UTC)

Re: Overuse of strings Lauri Alanko 25 Jan 2006 18:19 UTC

On Tue, Jan 24, 2006 at 11:51:34AM -0800, Per Bothner wrote:
> What would using symbols and s-exp gain?  What kind of
> operations would it make easier?

There are two different issues here: how should paths or URIs be
represented at run-time, and what kind of notation should be used for
giving literal values for them in code. As you are speaking about
"operations", I assume you mean the former here.

To me it is obvious: _all_ common operations on URIs are easier if you
have a structured representation instead of a flat string. Maybe the
most common operation is resolving a relative URI against a base URI. A
purely string-based implementation is a huge mess that involves
searching for slashes from right to left (but remembering that
consequent slashes count as a single one), detecting ".." and "."
-segments and whatnot... it's the sort of thing you expect to see only
in C code.

Any sane implementation will first parse the URI into its constituents
and form a list of path segments, and then operate on that list. It
would be just silly to constantly parse and unparse the URIs at every
operation, so it's better to have a distinct internal representation for
them. And indeed, this is why many languages do have special types or
classes for representing URIs.

But I wasn't talking about this, but about the syntax of the module
language. And, indeed, you're right about this one:

> Your argument is an aethetic one - which is certainly valid.

Essentially, yes. I very much like the minimalism and cleanliness of
Scheme's syntax: there's just names, spaces and parentheses, and no icky
$-prefixes for identifiers, nor quotation marks anywhere except in plain
text literals. I just find it distasteful if a _string_ is being used as
an identifier.

You are right that URIs are standard way of identifying resources, and
as such they are a fine choice for a library path. However, to my mind a
URI is is essentially an abstract object that has various attributes
(scheme, username, hostname, port, path, query string) that may be
present or not. The URIs have a standardized written representation that
is intended to be useful in all kinds of contexts (e.g. an unescaped
space is forbidden in URI syntax since URIs are often inserted into
plain text as such). However, Scheme source code is a special context
because in this context there is already a conventional way of
representing all kinds structured data: s-expressions. Having structured
data in some other format looks just plain weird.

For an analogy, consider regular expressions: POSIX-style regexps are
compact, but quite un-schemish to my mind. And to others' minds too.
Hence we have SRE notation, which represents the structure of a regexp
exactly the same way as all structures are represented in Scheme: with
s-expressions. URIs should be no different.

> What about "path names" (as used in file operations): Should they be
> structured objects or strings?

Definitely objects. Nowadays PLT Scheme has built-in support for path
objects, but before that I used to use a simple library:

  ;; A path is (dir subdir subsubdir . file), or the file may be missing,
  ;; designating the actual directory.
  ;; If the path starts with /, it's absolute

  (define (reverse-path path)
    (if (pair? path)
        (cons '.. (reverse-path (cdr path)))
        '()))

  (define (relative-path from to)
    (or (and (pair? to)
             (let ((h (car to)))
               (or (and (pair? from) (eq? (car from) h)
                        (relative-path (cdr from) (cdr to)))
                   (and (eq? h '/) to))))
        (append (reverse-path from) to)))

Here relative-path calculates the relative path from "from" to "to".
Would you like to do this kind of stuff using _strings_?

I just find it sad that underneath all these high-level conveniences,
the operating system still uses strings for paths in the system call
interface. As a result, '/' is an utterly magical character that cannot
appear in any file's name.

> There are good reasons to prefer strings (standard, universal, and
> familiar, as listed above). At least it makes sense to read and print
> pathnames using URI syntax.

Certainly it should be possible, but hardly the default. XML's surface
syntax is also standard, universal and familiar. Would you suggest that
XML data in Scheme code be therefore expressed with strings:
"<foo>bar<baz/></foo>" instead of, say, Xexprs: (foo "bar" (baz))?

Lauri