Title

wisp: simpler indentation-sensitive scheme

Author

Arne Babenhauserheide

Status

This SRFI is currently in ``draft'' status. To see an explanation of each status that a SRFI can hold, see here. To provide input on this SRFI, please mail to <srfi minus 119 at srfi dot schemers dot org>. See instructions here to subscribe to the list. You can access previous messages via the archive of the mailing list.

Received: 2015/01/25
Draft: 2015/02/03-2015/04/03

Acknowledgments

Thanks for many constructive discussions goes to Alan Manuel K. Gloria and David A. Wheeler.
Also thanks to Mark Weaver for his help with the wisp parser and the guile integration - including a 20x speedup.

Abstract

This SRFI describes a simple syntax which allows making scheme easier to read for newcomers while keeping the simplicity, generality and elegance of s-expressions. Similar to SRFI 110, SRFI 49 and Python it uses indentation to group expressions. Like SRFI 110 wisp is general and homoiconic.

Different from its predecessors, wisp only uses the absolute minimum of additional syntax-elements which are required for writing and exchanging arbitrary code-structures. As syntax elements it only uses a colon surrounded by whitespace, the period followed by whitespace as first code-character on the line and optional underscores followed by whitespace at the beginning of the line.

It resolves a limitation of SRFI 110 and SRFI 49, both of which force the programmer to use a single argument per line if the arguments to a procedure need to be continued after a procedure-call.

Wisp expressions can include any s-expressions and as such provide backwards compatibility.

wisp	s-exp
define : hello who format #t "Hello ~A!\n" . who hello "Wisp"	(define (hello who) (format #t "Hello ~A!\n" who)) (hello "S-exp")

Issues

wisp-scheme: REPL: sometimes the output of a command is only shown after typing the next non-empty line.

Rationale

A big strength of Scheme and other lisp-like languages is their minimalistic syntax. By using only the most common characters like the period, the comma, the quote and quasiquote, the hash, the semicolon and the parens for the syntax (.,"'`#;()), they are very close to natural language.⁽¹⁾ Along with the minimal list-structure of the code, this gives these languages a timeless elegance.

But as SRFI 110 explains very thoroughly (which we need not repeat here), the parentheses at the beginning of lines hurt readability and scare away newcomers. Additionally using indentation to mark the structure of the code follows naturally from the observation that most programmers use indentation, with many programmers letting their editor indent code automatically to fit the structure. Indentation is an important way how programmers understand code and using it directly to define the structure avoids errors due to mismatches between indentation and actual meaning.

As a solution to this, SRFI 49 and SRFI 110 provide a way to write whitespace sensitive scheme, but both have their share of issues.

As noted in SRFI 110, there are a number of implementation-problems in SRFI 49, as well as specification shortcomings like choosing the name “group” for the construct which is necessary to represent double parentheses. In addition to the problems named in SRFI 110, SRFI 49 is not able to continue the arguments to a procedure on one line, if a prior argument was a procedure call. The following example shows the difference between wisp and SRFI 49 for a very simple code snippet:

wisp	SRFI 49
* 5 + 4 3 . 2 1	* 5 + 4 3 2 1

Here wisp uses the leading period to mark a line as continuing the argument list.⁽²⁾

SRFI 110 improves a lot over SRFI 49. It resolves the group-naming and reduces the need to continue the argument-list by introducing 3 different grouping syntax forms ($, \\ and <* *>). These additional syntax-elements however hurt readability for newcomers (obviously the authors of SRFI 110 disagree with this assertion. Their view is discussed in SRFI 110 in the section about wisp). The additional syntax elements lead to structures like the following (taken from examples from the readable project):

SRFI 110 / readable
myprocedure x: \\ original-x y: \\ calculate-y original-y
a b $ c d e $ f g
let <* x getx() \\ y gety() > ! {{x x} + {y * y}}

This is not only hard to read, but also makes it harder to work with the code, because the programmer has to learn these additional syntax elements and keep them in mind before being able to understand the code.

Like SRFI 49 SRFI 110 also cannot continue the argument-list without resorting to single-element lines, though it reduces this problem by the above grouping syntax forms and advertising the use of neoteric expressions from SRFI 105.

Wisp example

Since an example speaks more than a hundred explanations, the following shows wisp exploiting all its features - including curly-infix from SRFI 105:

define : factorial n
__  if : zero? n
____   . 1
____   * n : factorial {n - 1}

display : factorial 5 
newline

Advantages of Wisp

Wisp draws on the strength of SRFI 110 but avoids its complexities. It was conceived and improved in the discussions within the readable-project which preceded SRFI 110 and there is a comparison between readable in wisp in SRFI 110.

Like SRFI 110, wisp is general and homoiconic and interacts nicely with SRFI 105 (neoteric expressions and curly infix). Like SRFI 110, the expressions are the same in the REPL and in code-files. Like SRFI 110, wisp has been used for implementing multiple smaller programs, though the biggest program in wisp is still its implementations (written in wisp and bootstrapped via a simpler wisp preprocessor).

But unlike SRFI 110, wisp only uses the minimum of additional syntax-elements which are necessary to support arbitrary code-structures with indentation-sensitive code which is intended to be shared over the internet. To realize these syntax-elements, it generalizes existing syntax and draws on the most common non-letter non-math characters in prose. This allows keeping the actual representation of the code elegant and inviting to newcomers.

Wisp expressions are not as sweet as readable, but they KISS.

Disadvantages of Wisp

Using the colon as syntax element keeps the code very close to written prose, but it can interfere with type definitions as for example used in Typed Racket.⁽³⁾ This can be mitigated in let- and lambda-forms by using the parenthesized form. When doing so, wisp avoids the double-paren for type-declarations and as such makes them easier to catch by eye. For procedure definitions (the only define call where type declarations are needed in typed-racket), a declare macro directly before the define should work well.

Using the period to continue the argument list is unusual compared to other languages and as such can lead to errors when trying to return a variable from a procedure and forgetting the period.

Related SRFIs

SRFI 49 (Indentation-sensitive syntax): superseded by this SRFI,
SRFI 110 (Sweet-expressions (t-expressions)): alternative to this SRFI,
SRFI 105 (neoteric expressions and curly infix): supported in this SRFI by treating curly braces like brackets and parentheses. Curly infix is required by the implementation and the testsuite.
SRFI 30 (Nested Multi-line comments): complex interaction. Should be avoided at the beginning of lines, because it can make the indentation hard to distinguish for humans. SRFI 110 includes them, so there might be value in adding them. The wisp reference implementation does not treat them specially, though, which might create arbitrary complications.

Footnotes

⁽¹⁾ The most common non-letter, non-math characters in prose are .,":'_#?!;, in the given order as derived from newspapers and other sources (for the ngram assembling scripts, see the evolve keyboard layout project).
⁽²⁾ Conceptually, continuing the argument list with a period uses syntax to mark the rare case of not calling a procedure as opposed to marking the common case of calling a procedure. To back the claim, that calling a procedure is actually the common case in scheme-code, grepping the the modules in the Guile source code shows over 27000 code-lines which start with a paren and only slightly above 10000 code-lines which start with a non-paren, non-comment character. Since wisp-syntax mostly follows the regular scheme indentation guidelines (as realized for example by Emacs), the whitespace in front of lines does not need to change.
⁽³⁾ Typed Racket uses calls of the form (: x Number) to declare types. These forms can still be used directly in parenthesized form, but in wisp-form the colon has to be replaced with \:. In most cases type-declarations are not needed in typed racket, since the type can be inferred. See When do you need type annotations?

Specification

The specification is separated into four parts: A general overview of the syntax, a more detailed description, justifications for each added syntax element and clarifications for technical details.

Overview

The basics of wisp syntax can be defined in 4 rules, each of which emerges directly from a requirement:

Wisp syntax 1/4: procedure calls

Indentation:

display
  + 3 4 5
newline

becomes

(display
  (+ 3 4 5))
(newline)

requirement: call procedure without parenthesis.

Wisp syntax 2/4: Continue Argument list

The period:

+ 5
  * 4 3
  . 2 1

becomes

(+ 5
  (* 4 3)
  2 1)

This also works with just one argument after the period. To start a line without a procedure call, you have to prefix it with a period followed by whitespace.

requirement: continue the argument list of a procedure after an intermediate call to another procedure.

Wisp syntax 3/4: Double Parens

The colon:

let
  : x 1
    y 2
    z 3
  body

becomes

(let
  ((x 1)
   (y 2)
   (z 3))
  (body))

requirement: represent code with two adjacent blocks in double-parentheses.

Wisp syntax 4/4: Resilient Indentation

The underscore (optional):

let
_ : x 1
__  y 2
__  z 3
_ body

becomes

(let
  ((x 1)
   (y 2)
   (z 3))
  (body))

requirement: share code in environments which do not preserve whitespace.

Summary

The syntax shown here is the minimal syntax required for the goal of wisp: indentation-based, general lisp with a simple preprocessor, and code which can be shared easily on the internet:

. to continue the argument list
: for double parens
_ to survive HTML

More detailed: Wisp syntax rules

Unindented line

A line without indentation is a procedure call, just as if it would start with a parenthesis.

display "Hello World!"              ;    (display "Hello World!")

Sibling line

A line which is more indented than the previous line is a sibling to that line: It opens a new parenthesis.

display                             ;    (display
  string-append "Hello " "World!"   ;      (string-append "Hello " "World!"))

Closing line

A line which is not more indented than previous line(s) closes the parentheses of all previous lines which have higher or equal indentation. You should only reduce the indentation to indentation levels which were already used by parent lines, else the behaviour is undefined.

display                             ;    (display
  string-append "Hello " "World!"   ;      (string-append "Hello " "World!"))
display "Hello Again!"              ;    (display "Hello Again!")

Prefixed line

To add any of ' , ` #' #, #` or #@, to the first parenthesis on a line, just prefix the line with that symbol followed by at least one space. Implementations are free to add more prefix symbols.

' "Hello World!"                    ;    '("Hello World!")

Continuing line

A line whose first non-whitespace characters is a dot followed by a space (". ") does not open a new parenthesis: it is treated as simple continuation of the first less indented previous line. In the first line this means that this line does not start with a parenthesis and does not end with a parenthesis, just as if you had directly written it in lisp without the leading ". ".

string-append "Hello"               ;    (string-append "Hello"
  string-append " " "World"         ;      (string-append " " "World")
  . "!"                             ;      "!")

Empty indentation level

A line which contains only whitespace and a colon (":") defines an indentation level at the indentation of the colon. It opens a parenthesis which gets closed by the next line which has less or equal indentation. If you need to use a colon by itself. you can escape it as "\:".

let                                 ;    (let
  :                                 ;      (
    msg "Hello World!"              ;        (msg "Hello World!"))
  display msg                       ;      (display msg))

Inline Colon

A colon surrounded by whitespace (" : ") starts a parenthesis which gets closed at the end of the line.

define : hello who                  ;    (define (hello who)
  display                           ;      (display 
    string-append "Hello " who "!"  ;        (string-append "Hello " who "!")))

If the colon starts a line which also contains other non-whitespace characters, it starts a parenthesis which gets closed at the end of the line and defines an indentation level at the position of the colon.

If the colon is the last non-whitespace character on a line, it represents an empty pair of parentheses:

let :                               ;    (let ()
    display "Hello"                 ;         (display "Hello"))

Initial Underscores

You can replace any number of consecutive initial spaces by underscores, as long as at least one whitespace is left between the underscores and any following character. You can escape initial underscores by prefixing the first one with \ ("\___ a" → "(_ a)"), if you have to use them as procedure names.

define : hello who                  ;    (define (hello who)
_ display                           ;      (display 
___ string-append "Hello " who "!"  ;        (string-append "Hello " who "!")))

Parens and Strings

Linebreaks inside parentheses and strings are not considered linebreaks for parsing indentation. To use parentheses at the beginning of a line without getting double parens, prefix the line with a period.

define : stringy s 
         string-append s " reversed and capitalized:
 " ; linebreaks in strings do not affect wisp parsing
           . (string-capitalize ; same for linebreaks in parentheses
             (string-reverse s))

Effectively code in parentheses and strings is interpreted directly as Scheme. This way you can simply copy a thunk of scheme into wisp. The following is valid wisp:

define foo (+ 1
  (* 2 3)) ; defines foo as 7

Clarifications

Code-blocks end after 2 empty lines followed by a newline. Indented non-empty lines after 2 empty lines should be treated as error. A line is empty if it only contains whitespace. A line with a comment is never empty.
Inside parentheses, wisp parsing is disabled. Consequently linebreaks inside parentheses are not considered linebreaks for wisp-parsing. For the parser everything which happens inside parentheses is treated as a black box.
Square brackets and curly braces should be treated the same way as parentheses: They stop the indentation processing until they are closed.
Likewise linebreaks inside strings are not considered linebreaks for wisp-parsing.
A colon (:) at the beginning of a line adds an extra open parentheses that gets closed at end-of-line and defines an indentation level.
Using a quote to escape a symbol separated from it by whitespace is forbidden. This would make the meaning of quoted lines ambiguous.
Curly braces should be treated as curly-infix following SRFI 105. This makes most math look natural to newcomers.
Neoteric expressions from SRFI 105 are not required because they create multiple ways to represent the same code. In wisp they add much less advantages than in sweet expressions from SRFI 110, because wisp can continue the arguments to a procedure after a procedure call (with the leading period) and the inline colon provides most of the benefits neoteric expressions give to sweet. However implementations providing wisp should give users the option to activate neoteric expressions as by SRFI 105 to allow experimentation and evolution (discussion).
It is possible to write code which is at the same time valid wisp and sweet. The readable mailing list contains details.
The suggested suffix for files using wisp-syntax is .w.
To represent tail notation like (define (foo . args)), either avoid a linebreak before the dot as in define : foo . args or use a double dot to start the line: . . args. The first dot mark the line as continuation, the second enters the scheme code.
A dot as symbol at the end of a line is reserved for potential future use. It should be a syntax error if the next non-empty line starts with non-zero indentation. A lone dot at the end of a line calls for hard to catch errors.
A dot as only symbol in a line has no useful meaning: the line is by definition empty. As such, a dot as only symbol on a line is also reserved for future use and should be treated as a syntax error to avoid locking out future possibilities.

Syntax justification

I do not like adding any unnecessary syntax element to lisp. So I want to show explicitly why the syntax elements are required.

. (the dot)

To represent general code trees, we have to be able to represent continuation of the arguments of a procedure with an intermediate call to another (or the same) procedure.

The dot at the beginning of the line as marker of the continuation of a variable list is a generalization of using the dot as identity procedure - which is an implementation detail in many lisps.

(. a) is just a

So for the single variable case, this would not even need additional parsing: wisp could just parse . a to (. a) and produce the correct result in most lisps. But forcing programmers to always use separate lines for each parameter would be very inconvenient, so the definition of the dot at the beginning of the line is extended to mean “take every element in this line as parameter to the parent procedure”.

(. a) → a is generalized to (. a b c) → a b c.

At its core, this dot-rule means that we mark variables in the code instead of procedure calls. We do so, because variables at the beginning of a line are much rarer in Scheme than in other programming languages.

: (the colon)

For double parentheses and for some other cases we must have a way to mark indentation levels which do not contain code. Wisp uses the colon, because it is the most common non-alpha-numeric character in normal prose which is not already reserved as syntax by Scheme when it is surrounded by whitespace, and because it already gets used without surrounding whitespace for marking keyword arguments to procedures in Emacs Lisp and Common Lisp, so it does not add completely alien concepts.

The inline procedure call via inline " : " is a limited generalization of using the colon to mark an indentation level: If we add a syntax-element, we should use it as widely as possible to justify adding syntax overhead.

But if you need to use : as variable or procedure name, you can still do so by escaping it with a backslash (\:), so this does not forbid using the character.

For simple cases, the colon could be replaced by clever whitespace parsing, but there are complex cases which make this impossible. The minimal example is a theoretical doublelet which does not require a body. The example uses a double let without action as example for the colon-syntax, even though that does nothing, because that makes it impossible to use later indentation to mark an intermediate indentation-level. Another reason why I would not use later indentation to define whether something earlier is a single or double indent is that this would call for subtle and really hard to find errors.

(doublelet
  ((foo bar))
  ((bla foo)))

The wisp version of this is

doublelet
  :
    foo bar
  : ; <- this empty back step is the real issue
    bla foo

or shorter with inline colon (which you can use only if you don’t need further indentation-syntax inside the assignment).

doublelet
  : foo bar
  : bla foo

The need to be able to represent arbitrary syntax trees which can contain expressions like this is the real reason, why the colon exists. The inline and start-of-line use is only a generalization of that principle (we add a syntax-element, so we should see how far we can push it to reduce the effective cost of introducing the additional syntax).

Clever whitespace-parsing which would not work

There are two alternative ways to tackle this issue: deferred level-definition and fixed-width indentation.

Defining intermediate indentation-levels by later elements (deferred definition) would be a problem, because it would create code which is really hard to understand. An example is the following:

define (flubb)
    nubb
    hubb
    subb
   gam

would become

(define (flubb)
   ((nubb))
   ((hubb))
   ((subb))
  (gam))

while

define (flubb)
    nubb
    hubb
    subb

would become

(define (flubb)
   (nubb)
   (hubb)
   (subb))

Knowledge of later parts of the code would be necessary to understand the parts a programmer is working on at the moment. This would call for subtle errors which would be hard to track down, because the effect of a change in code would not be localized at the point where the change is done but could propagate backwards.

Fixed indentation width (alternative option to inferring it from later lines) would make it really hard to write readable code. Stuff like this would not be possible:

when
    equal? wrong
           isright? stuff
    fixstuff

_ (the underscore)

In Python the whitespace hostile html already presents problems with sharing code - for example in email list archives and forums. But Python-programmers can mostly infer the indentation by looking at the previous line: If that ends with a colon, the next line must be more indented (there is nothing to clearly mark reduced indentation, though). In wisp we do not have this support, so we need a way to survive in the hostile environment of today's web.

The underscore is commonly used to denote a space in URLs, where spaces are inconvenient, but it is rarely used in Scheme (where the dash ("-") is mostly used instead), so it seems like a a natural choice.

You can still use underscores anywhere but at the beginning of the line, and even at the beginning of the line you simply need to escape it by prefixing the first underscore with a backslash ("\____").

Implementation

The reference implementation realizes a specialized parser for Scheme. It uses GNU Guile and can also be used at the REPL.

The wisp code also contains a general wisp-preprocessor which can be used for any lisp-like language and can used as an external program which gets called on reading. It does not actually have to understand the code itself.

To allow for easy re-implementation, the chapter after the implementation itself contains a test-suite with commonly used wisp constructs and parenthesized counterparts.

The wisp preprocessor implementation can be found in the wisp code repository. Both implementations are explicitly licensed to allow inclusion in an SRFI.

The reference implementation linked below generates a syntax tree from wisp which can be executed. It is written in indentation-based wisp-syntax and converted with the preprocessor from the code repository (wisp-guile.w) to parenthesized scheme syntax.

Source for the reference implementation.
Basic Testsuite for wisp implementations.
(a more exhaustive testsuite is available in the wisp code repository)

Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Michael Sperber

Last modified: Tue Mar 11 21:25:26 MET 2015