New drafts of SRFIs 13 & 14 available

New drafts of SRFIs 13 & 14 available shivers@xxxxxx 17 Dec 2000 22:29 UTC
I have finally completed the html-markup of SRFIs 13 & 14. The relevant
URLs are at
    ftp://ftp.ai.mit.edu/people/shivers/srfi/13/
    ftp://ftp.ai.mit.edu/people/shivers/srfi/13/string-lib.txt
    ftp://ftp.ai.mit.edu/people/shivers/srfi/13/string-lib.html
    ftp://ftp.ai.mit.edu/people/shivers/srfi/13/string-lib.scm
    ftp://ftp.ai.mit.edu/people/shivers/srfi/13/string-package.scm

    ftp://ftp.ai.mit.edu/people/shivers/srfi/14/
    ftp://ftp.ai.mit.edu/people/shivers/srfi/14/cset-lib.txt
    ftp://ftp.ai.mit.edu/people/shivers/srfi/14/cset-lib.html
    ftp://ftp.ai.mit.edu/people/shivers/srfi/14/cset-lib.scm
    ftp://ftp.ai.mit.edu/people/shivers/srfi/14/cset-package.scm
    ftp://ftp.ai.mit.edu/people/shivers/srfi/14/cset-tests.scm

May I add that the HTML typesetting was an incredibly tedious and painful job.

They will be moved over to the SRFI sites at
    http://srfi.schemers.org/srfi-13/
    http://srfi.schemers.org/srfi-14/
imminently; Mike Sperber will announce when that has happened.

Like SRFI-1, the HTML for these two SRFIs has internal links from a
procedure/variable index that appears at the top of the document to every
definition inside the document, so they should work well as tools for quickly
looking up a procedure. (Unfortunately, there's a bug in Netscape that renders
the internal links found in the first line of text in a section or paragraph
inoperative. Bizarre. But typical of Netscape. This shuts down some of the
links. Perhaps it will be fixed in Netscape 6.)

I've checked the layout of the HTML on both netscape & Internet Explorer.
(It looks better on IE, which does a better job of implementing style
sheets. Netscape is, again, pretty broken.) I've also run them through
the W3's html validator to check them against the HTML 4 spec.

I've folded in updates to the reference implementation due to Brad Lucier's
review.

Even at this late date, I have made two alterations to the content of the
SRFI: an addition and a small change.

The addition: I've added one more facility to the character-set library (SRFI
14). I realised that there is no primitive facility allowing other iteration
systems such as loop macros to loop over the elements of a character set. So
I've added a simple cursor interface to provide this. Here is the
documentation:
------
char-set-cursor cset -> cursor
char-set-index cset cursor -> char
char-set-cursor-next cset cursor -> cursor
end-of-char-set? cursor -> boolean
    Cursors are a low-level facility for iterating over the characters in a
    set. A cursor is a value that indexes a character in a char set.
    CHAR-SET-CURSOR produces a new cursor for a given char set. The set
    element indexed to by the cursor is fetched with CHAR-SET-INDEX. A cursor
    index is incremented with CHAR-SET-CURSOR-NEXT; in this way, code can step
    through every character in a char set. Stepping a cursor "past the end" of
    a char set produces a cursor that answers true to END-OF-CHAR-SET?. It is
    an error to pass such a cursor to CHAR-SET-INDEX or to
    CHAR-SET-CURSOR-NEXT.

    A cursor value may not be used in conjunction with a different character
    set; if it is passed to CHAR-SET-INDEX or CHAR-SET-CURSOR-NEXT with
    a character set other than the one used to create it, the results and
    effects are undefined.

    Cursor values are *not* necessarily distinct from other types. They may be
    integers, linked lists, records, procedures or other values. This license
    is granted to allow cursors to be very "lightweight" values suitable for
    tight iteration, even in fairly simple implementations.

    Note that these primitives are necessary to export an iteration facility
    for char sets to loop macros.

    Example:

	(define cs (char-set #\G #\a #\T #\e #\c #\h))

	;; Collect elts of CS into a list.
	(let lp ((cur (char-set-cursor cs)) (ans '()))
	  (if (end-of-char-set? cur) ans
	      (lp (char-set-cursor-next cs cur)
	          (cons (char-set-index cs cur) ans))))
	  => (#\G #\T #\a #\c #\e #\h)

	;; Equivalently, using a list unfold (from SRFI 1):
	(unfold-right end-of-char-set?
	              (curry char-set-index cs)
		      (curry char-set-cursor-next cs)
		      (char-set-cursor cs))
	  => (#\G #\T #\a #\c #\e #\h)

    Rationale: Note that the cursor API's four functions "fit" the functional
    protocol used by the unfolders provided by the list, string and char-set
    SRFIs (see the example above). By way of contrast, here is a simpler,
    two-function API that was rejected for failing this criterion. Besides
    CHAR-SET-CURSOR, it provided a single function that mapped a cursor and a
    character set to two values, the indexed character and the next cursor. If
    the cursor had exhausted the character set, then this function returned
    false instead of the character value, and another end-of-char-set cursor.
    In this way, the other three functions of the current API were combined
    together.
------
And here is the reference implementation; it's only about 14 lines of code.
------
    ;;; Cursors
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;; Simple implementation. A cursors is an integer index into the
    ;;; mark vector, and -1 for the end-of-char-set cursor.

    (define (char-set-cursor cset)
      (%char-set-cursor-next cset 256 char-set-cursor))

    (define (end-of-char-set? cursor) (< cursor 0))

    (define (char-set-index cset cursor) (%latin1->char cursor))

    (define (char-set-cursor-next cset cursor)
      (check-arg (lambda (i) (and (integer? i) (exact? i) (<= 0 i 255))) cursor
		 char-set-cursor-next)
      (%char-set-cursor-next cset cursor char-set-cursor-next))

    (define (%char-set-cursor-next cset cursor proc)	; Internal
      (let ((s (%char-set:s/check cset proc)))
	(let lp ((cur cursor))
	  (let ((cur (- cur 1)))
	    (if (or (< cur 0) (si=1? s cur)) cur
		(lp cur))))))
------

I'm sorry for the late addition, but it is basic and important. Loop macros are
going to happen. Some facility must be exported from all container and
sequence types to allow iteration over them.

The small change: I originally spec'd the char-set and string hash functions
as taking an optional BOUND value, which is either a non-negative integer or
false (meaning the default). This was a small error -- I have just realised
that BOUND's valid numeric range isn't *non-negative* but *positive*. That is,
zero is not a valid bound.

So I have coopted zero to serve the purpose formerly filled by the false value
-- it means the default, implementation-specific "large" modulus. Not only
does this simplify the type of the hash functions, since this parameter is now
simply an exact integer, but using 0 as the "effectively infinite" modulus
makes a certain minor sort of mathematical sense -- it is still true that
    modulus * quotient + remainder = dividend
(that is, remainder is defined when modulus is zero, even though the quotient
is not).

So I have changed the text of the spec and the implementation's arg checking
for hash functions simply to require BOUND to be a non-negative value. No more
#f. Now it is the case that
    (string-hash s 0) = (string-hash s)

It's important to clean up even a small detail like this *now*, because hash
functions are going to occur in many future libraries. We want a convention
that will hold across all of these future SRFIs.

The SRFIs are typeset and ready to go. I'm setting a 48-hour bound on
discussion of both the hash bound value and the character-set iteration cursor
facility. Then this thing is done.
    -Olin