extending the discussion

Show/hide message thread
extending the discussion Tom Lord (15 Dec 1999 19:57 UTC)
Re: extending the discussion d96-mst@xxxxxx (16 Dec 1999 21:20 UTC)
extending the discussion Tom Lord 15 Dec 1999 19:57 UTC

I propose that the discussion period for SRFI-13 be extended and that
the additional time be used to consider some substantial changes.

The primary issue is the treatment of shared substrings and string
indexes.

	* The convention of permitting string interfaces everywhere
	  makes it more difficult to implement extensions of the
	  proposed library, and complicates the interfaces of all
	  conforming extensions.  This is because extensions ought to
	  also obey the same convention.

	* Permitting string indexes everywhere precludes conformance
	  to another, more established convention: that comparison
	  functions should permit more than two arguments.

	* Permitting string indexes everywhere encourages programmers
	  seeking to write portable code to use string indexes when
	  shared substrings would be simpler and less error prone.
	  This makes it awkward to incorporate the code such
	  programmers write into systems which support shared
	  substrings.  Moreover, manipulating string indexes is
	  notoriously error prone and so should not be a prominent
	  feature of portable Scheme style.

	* A reasonable alternative is to have two SRFI's: One, a
	  string library without ubiquitous index parameters and
	  without "substring/shared", the other, a SRFI for
	  "substring/shared" in which that procedure is guaranteed to
	  return a value which shares state with its primary string
	  argument.

	  Those who use only the first SRFI are thereby discouraged
	  from trying to use Scheme for hyper-efficient string
	  processing.  Those who want to use Scheme for
	  hyper-efficient string processing are thereby encouraged to
	  choose an implementation or environment which supports
	  shared substrings.  Those who want to write maximally
	  portable code are discouraged from counting on
	  hyper-efficient string processing.  All of the standardized
	  interfaces are kept simple and clean.  Existing conventions
	  (such as N-ary comparison operators) are preserved.

	  Future SRFI's, which define additional string manipulation
	  functions for use in applications which expect
	  hyper-efficient string processing should assume that
	  "substring/shared" is present.  SRFI's which define
	  additional string manipulation procedures just for
	  convenience, where efficiency is not a concern, can avoid
	  shared substrings.

	* In a SRFI which defines "substring/shared", it should be
	  mandatory that the string returned from that procedure share
	  state with the primary string argument.  Two additional
	  procedures are desirable:

		shared-substring? obj => boolean

	  which tells whether a particular string is a shared
	  substring, and

		containing-string string => string start end

	  which converts a shared substring to its parent string and
	  indexes, and an ordinary string to itself, 0, and its
	  length.

	  It is important that "substring/shared" return a truly
	  shared substring so that side effects on its result are
	  reflected in its argument.  That propagation of side effects
	  is an essential part of using shared substrings to write
	  code which manipulates strings efficiently.

A secondary issue is the conventions of accepting a parameter which
may be a character, character set, or predicate.

	* The CHAR/CSET/PREDICATE convention complicates the
          implementation of every procedure which uses it.  Future
          extensions to the library are similarly complicated.

	* The addition of a single procedure to the character set
          library could simplify the convention:

		(char-set-membership cset) => predicate

	  where

		(predicate c) => #t  ; if c is in cset
				 #f  ; otherwise

	   With `char-set-membership', the convention should be
	   simplified to CHAR/PREDICATE.

	 * I would not suggest further simplifying the convention
	   by defining:

		(char-equality character) => predicate

	  since passing a character for one of these parameters is
	  presumably, by far, the common case.  Also, passing a
	  character for one of these parameters is supported by the
	  traditional standards (R^nRS and every implementation).

	* There is then a choice between two conventions represented
	  by this example:

		string-index char/predicate string

	  or

		string-index char string
		string-predicate-index predicate string

	  I prefer the latter, though not for any particularly strong
	  reason.

Another secondary issue is whether symbols should be acceptable as
arguments to procedures that expect strings but that do not modify
those strings.  The print name of the symbol would be used as the
string value of a symbol.  I have found this convention natural, easy
to implement, and useful.

A final secondary issue is whether procedures that construct strings
from individual elements permit the use of strings (and symbols) as
elements.  SRFI-13 says they do not but I have found this feature to
be natural, easy to implement, and useful.

Tom Lord