Read-time _apply_: an external representation for any value oleg@xxxxxx 11 Apr 1999 20:55 UTC
This article was inspired by an exchange of messages between Dr. Clinger and Dr. Feeley about external representation of uniform vectors, which are proposed in SRFI-4. This post attempts to make a rather sweeping generalization of Dr. Clinger's suggestion. It proposes an extensible, new old way for external representation of Scheme values: read-time _application_. Motivation Introduction Proposed syntax Examples Representation of uniform vectors and matrices Comparisons Comparisons with CL's #. reader-macro How to extend the reader while it scans the code Going ahead Motivation The exchange of messages mentioned above stemmed from a minor incompatibility between SRFI-4 and some Scheme implementations: SRFI-4 says: > [T]he external representation of instances of the datatype TAGvector is > #TAG(... elements ...). ... Note that the syntax for float vectors > conflicts with Standard Scheme which parses #f32() as 3 objects: #f, 32, > and (). For this reason, conformance to this SRFI implies this minor > nonconformance to Standard Scheme. In a message <http://srfi.schemers.org/srfi-4/mail-archive/msg00002.html> Dr. Clinger proposed a general workaround for this incompatibility problem: > why not consume just one more octathorpe [that is, #] > character (say 'v' for vector) as follows: > #vs8(...) #vu8(...) [elided] #vf64(...) > Then some future SRFI for vectors of vectors of floating point numbers > (or whatever) will have a notation that can be extended for the purpose, > and we won't have to have this same discussion all over again for the > second time, redundantly. I'd like to discuss an even more generic, universal way of representing an arbitrary Scheme data type. It is similar to Smalltalk's object serialization and CL's #. reader-macro. Introduction In Smalltalk, any object that defines a 'writeOn:' method may be asked to externalize itself. The result is a string that spells out a particular invocation of object's constructor. Evaluation of this string by an interpreter recreates the original object. LISP X3J13 document <http://www.harlequin.com/education/books/HyperSpec/Body/sec_2-4-8-6.html> expresses the goal of a read-time _evaluation_ with utmost clarity: "For an object that does not have a convenient printed representation, a form that computes the object can be given using the #. notation." It appears however that a weaker alternative -- a read-time _application_ -- may suffice. Most of the Scheme values -- with an exceptions of ports, structures, wills, semaphores and other exotic beasts -- have an external representation. The representations of booleans, numbers, lists, vectors, strings, etc. are codified in RnRS. Every Scheme reader has an in-born knowledge how to parse the corresponding strings and build Scheme values they represent. The set of these built-in constructors is however fixed, and not amenable. I'd like to propose to lift this limitation. Proposed syntax of a read-time application #`(tag arg1 ...) A 'tag' must be an (external representation of an) identifier, and 'arg1' etc. are external representations of some values, including other read-time applications. #@ (rather than #`) seems to be another good way to denote read-time applications. Upon encountering an #` external form, the read procedure should locate a read-constructor associated with the 'tag', read the arguments 'arg1'... and apply the constructor to the arguments. The result of the application is taken to be the value that corresponds to the external form under consideration There must be a way to declare an association between a tag and the corresponding constructor-procedure. Regular 'define' introduces an association between an identifier and a procedure applied at run time; 'define-macro' or 'define-syntax' introduce bindings for procedures applicable at compile time. Thus 'define-reader-ctor' literally suggest itself as a form to introduce a constructor to apply at read time. Examples: If (define-reader-ctor 'vector vector) then (with-input-from-string "#`(vector 1 2 3)" read) ==> '#(1 2 3) (equal? '#(1 2 3) '#`(vector 1 2 3)) ==> #t Thus #`(vector 1 2 3) becomes another external representation for a vector. With suitably defined reader-constructors, all standard Scheme data types may be represented in an external form of read-applications: #`(list 1 2 3) #`(list #`(string #\1 #\a) 1 2 #`(vector #f #f)) Furthermore, structures (records) and ports gain a printed form and can be read in: #`(make-structure point (x 3) (y 5) (color read)) #`(open-input-file "/tmp/a") For example, an expression (with-input-from-string "#`(open-input-file \"/tmp/a\")" (lambda () (read-char (read)))) will return the first character of the file "/tmp/a" Representation of uniform vectors and matrices The proposed read-time application gives uniform vectors the following generic external form: #`(vector-f32 1 2 3) #`(vector-u8 1 2 3), etc. This notation also appears to help with a problem of representing a 2D matrix with a zero dimension. This problem was mentioned by Dr. Feeley in <http://srfi.schemers.org/srfi-4/mail-archive/msg00003.html>: > The only problem with such a [SRFI-4] representation is that it is > not possible to distinguish a 0 by 4 float vector from an empty > one dimensional float vector, but a special notation could be used > for this very unusual case, > #f32((0.0 0.0 0.0 0.0) ^ 0) ; a 0 by 4 float vector > #f32((0.0 ^ 4) ^ 0) ; the same 0 by 4 float vector > #f32(() ^ 0) ; a 0 by 0 float vector The notation proposed in the present article can easily handle even the unusual cases above. Assuming we have defined a procedure (define (build-matrix-f32 dims values) ...) and an association (define-reader-ctor 'matrix-f32 build-matrix-f32), we can write #`(matrix-f32 (2 3) (10. 1.0) (20. 2.0) (30. 4.0)) #`(matrix-f32 (0 4)) ; a 0 by 4 float vector #`(matrix-f32 (4 0)) ; a 4 by 0 float vector #`(matrix-f32 (0 0)) ; a 0 by 0 float vector #`(matrix-f32 (0)) ; a 0 element one-dimensional float vector, which is the same as #`(vector-f32) Comparisons In Metcast Request Language <http://pobox.com/~oleg/ftp/Scheme/Request-Language.html>, which is used to describe an order for a great variety of weather-related data, a form "(tag a b ...)" is interpreted as (apply OBJ-loader:tag a b ...) where given 'tag', a procedure OBJ-loader:tag is looked up in the current dynamic "environment". Guile allows a user to declare his own #-symbol dispatch. In this proposal, the user is not limited to one letter after #. Read-time application mechanism defined in this article is rather close to Lisp reader's macro functions. Unlike CL, however, a reader-constructor is not allowed to read from the input stream on its own. It may only build values from other values, which must have already been read and internalized. A reader-constructor must always return one value; it may not return "nothing". It may however throw an exception or simply return an "inappropriate" value such as #f, which will be caught later. Unlike compile-time function applications (that is, macro-expansions), a read-time application has no "second pass". Common Lisp defines an external form #.obj, which instructs the Lisp reader to evaluate 'obj' right after the reader parsed it. While #. is a general-purpose read-evaluator: #.obj === (eval obj) #` is merely an application: #`obj === (apply (lookup (car obj)) (cdr obj)) and obj must be an external representation of a list. Read-time applications are further restricted to only those procedures that have been specifically declared for that purpose (via define-reader-ctor). The user thus has a fine-grained control over which functions are being applied at read time. The following examples will show the difference between #. and #`. The examples are hypothetical - I don't have a CL system handy to verify the Lisp reader returns what I think it does, and #` is still a proposal. (read-from-string "#.(+ 1 2)") ==> 3 (define-reader-ctor '+ +) (with-input-from-string "#`(+ 1 2)" read) ==> 3 (read-from-string "#.(+ 1 (+ 2 3))") ==> 6 (define-reader-ctor '+ +) (with-input-from-string "#`(+ 1 (+ 2 3))" read) ==> error: can't add a number 1 and a list '(+ 2 3) (with-input-from-string "#`(+ 1 #`(+ 2 3))" read) ==> 6 (define-reader-ctor 'my-vector (lambda x (apply vector (cons 'my-vector-tag x)))) (with-input-from-string "#`(my-vector (my-vector 1 2))" read) ==> a vector whose second element is a list of a symbol my-vector, number 1, and number 2. (with-input-from-string "#`(my-vector #`(my-vector 1 2))" read) ==> a vector whose second element is a my-vector constructed from numbers 1 and 2. How to extend the reader while it scans the code The remaining question is how and _when_ to define reader-constructors. If define-reader-ctor is a regular built-in function, declaration of a new reader-constructor will affect the Scheme reader only when the code has been completely read, (byte)compiled and is being executed. We would like to be able to extend the Scheme reader while it still reads the code. For example, we want to declare (define-reader-ctor 'vector-f32 <constructor>) and be able to write (let ((v1 #`(vector-f32 1 2 3))) ...) further down the code. Fortunately, there are several ways code can affect a compiler/interpreter while it scans the code. For example, define-macro or define-syntax "extend" a Scheme compiler at compile time. So do command-line switches and pragmas, which are present in many systems. In Gambit, pragmas are called "declare forms". In addition, Gambit allows customization via profile forms, which may also be specified on command line: gsi -e "(set-case-conversion! #f)" /tmp/case-sensitive-code.scm Going ahead Implementation of the proposed reader-application scheme is relatively straightforward, especially in Gambit. A Gambit reader is actually quite close to the Lisp reader -- readtables and such. Introducing of #` will require only few minor extensions. As Gambit reader is written in Scheme, reader-constructors could immediately be ported to other implementations. However, since the Gambit reader is written and copyrighted by Marc Feeley, and is a big chunk of gambc30/lib/_io.scm file, it is wise to seek Dr. Feeley's permission first. I can implement reader-constructors relatively quickly -- provided it is worth doing at all.