Re: text = symbol? | Simplelists

Show/hide message thread
text = symbol? Marc Feeley (14 Jun 2016 02:37 UTC)
Re: text = symbol? Marc Feeley (14 Jun 2016 02:40 UTC)
Re: text = symbol? Per Bothner (14 Jun 2016 03:45 UTC)
Re: text = symbol? William D Clinger (14 Jun 2016 12:58 UTC)
Re: text = symbol? William D Clinger 14 Jun 2016 12:58 UTC
Marc Feeley wrote:

> I haven’t followed all of the discussion so perhaps this comment
> has been made, but it strikes me as a bit odd to introduce a new
> type for immutable strings, “text”, when Scheme already has the
> “symbol” type for this concept.  Scheme currently has the following
> types for data related to textual information:
>
> - character
> - string
> - symbol
>
> I would prefer unifying those types rather than adding other ones.

Unification of those types could have been addressed by the Scheme
Language Steering Committee and made part of the mission for the two
working groups it created.  That was not done.  It could still be
done by the SLSC or WG2, of course, but unification of existing types
is not something that can be accomplished by a SRFI.

> I would be happy with a Scheme with only a single immutable text type
> (what is now known as a symbol) after all characters are just symbols
> of length 1, and fixed length mutable strings are a rather low-level
> concept (they are more an artifact of how text has been historically
> represented).  In fact some languages [have immutable strings and no
> character type].

> Backward compatibility is a strong argument for not removing characters
> and mutable strings, and I can live with that.  However, adding a new
> type of immutable text strikes me as redundant and confusing.

The goals of SRFI 135 are:

    * to achieve O(1) sequential processing of texts
    * to achieve reasonably compact representations of long texts
    * to achieve both via portable sample implementations

Those goals became important because several major implementations of
Scheme have implemented strings in a way that preserves O(1) string-ref
at the expense of using four bytes per character, while others have
implemented compact strings (often using UTF-8 or UTF-16) in a way that
causes string-ref to require time linear in the length of a string.

The widely differing performance characteristics of Scheme strings have
made it difficult to write portable code that makes heavy use of strings.

SRFI 130 did not solve that problem.  Although cursors are trivial to
implement in systems where string-ref is already O(1), cursors do nothing
to improve the compactness of strings in those systems.  Although cursors
could be implemented in such a way as to achieve O(1) sequential access
in systems that have compact strings, they cannot achieve O(1) access in
any portable implementation, which means SRFI 130 will not achieve O(1)
sequential access in portable code until all major implementations of
Scheme implement SRFI 130 efficiently.  That seems unlikely to happen in
the near future.  After all, some major implementations still haven't
implemented R7RS-small.

Symbols cannot solve the problem, because there is no portable way to
write a symbol-ref procedure that delivers O(1) sequential access to the
individual characters of a symbol.  As Per Bothner observed, the fact
that symbols are interned represents another major cost for symbols that
immutable texts should not incur.

SRFI 135 solves the problem, and solves it today.  SRFI 135 comes with
portable sample implementations that deliver O(1) sequential and random
access everywhere while approaching the space efficiency of UTF-8 or
UTF-16 on long texts.

Will