Re: Type strategy for Scheme

Show/hide message thread

Type strategy for Scheme Lassi Kortela (09 Nov 2022 22:54 UTC)

Re: Type strategy for Scheme Marc Nieper-Wißkirchen (10 Nov 2022 07:07 UTC)

Re: Type strategy for Scheme Marc Nieper-Wißkirchen (10 Nov 2022 07:42 UTC)

Re: Type strategy for Scheme Peter Bex (10 Nov 2022 08:05 UTC)

Re: Type strategy for Scheme Philip McGrath (10 Nov 2022 08:54 UTC)

Re: Type strategy for Scheme Lassi Kortela (10 Nov 2022 09:49 UTC)

Re: Type strategy for Scheme Lassi Kortela (10 Nov 2022 09:08 UTC)

Re: Type strategy for Scheme Marc Feeley (10 Nov 2022 23:34 UTC)

Re: Type strategy for Scheme Lassi Kortela (11 Nov 2022 19:17 UTC)

Re: Type strategy for Scheme Marc Feeley (11 Nov 2022 21:34 UTC)

Re: Type strategy for Scheme Lassi Kortela (12 Nov 2022 21:58 UTC)

Re: Type strategy for Scheme Marc Nieper-Wißkirchen (12 Nov 2022 22:52 UTC)

Re: Type strategy for Scheme Lassi Kortela (12 Nov 2022 23:14 UTC)

Re: Type strategy for Scheme Marc Nieper-Wißkirchen (13 Nov 2022 09:41 UTC)

Re: Type strategy for Scheme Lassi Kortela (13 Nov 2022 11:59 UTC)

Re: Type strategy for Scheme John Cowan (13 Nov 2022 19:52 UTC)

Re: Type strategy for Scheme Lassi Kortela (13 Nov 2022 20:22 UTC)

Re: Type strategy for Scheme Marc Nieper-Wißkirchen (13 Nov 2022 20:35 UTC)

Re: Type strategy for Scheme Lassi Kortela (13 Nov 2022 21:41 UTC)

Re: Type strategy for Scheme Marc Feeley (14 Nov 2022 00:03 UTC)

Re: Type strategy for Scheme Lassi Kortela (14 Nov 2022 09:36 UTC)

Re: Type strategy for Scheme Marc Feeley (14 Nov 2022 16:27 UTC)

Re: Type strategy for Scheme Lassi Kortela (14 Nov 2022 19:58 UTC)

Re: Type strategy for Scheme John Cowan (13 Nov 2022 20:40 UTC)

Re: Type strategy for Scheme Marc Nieper-Wißkirchen (13 Nov 2022 20:42 UTC)

Re: Type strategy for Scheme Panicz Maciej Godek (20 Nov 2022 21:59 UTC)

Re: Type strategy for Scheme Per Bothner (20 Nov 2022 22:58 UTC)

Re: Type strategy for Scheme Lassi Kortela 14 Nov 2022 09:36 UTC

> The thing is that “static typing” is not a “feature” that is orthogonal to other features. When static typing is added to an otherwise dynamically typed language it “cripples” the language… some things can’t be expressed anymore because they violate the rules of the typing discipline (mind you it cripples the expressiveness but does give some nice properties in return, so it is not black and white). For example, the Y combinator works fine in Scheme, but many static typing disciplines will not accept it. So adding static typing puts constraints on the other features, making some of them unusable.

Yes. But it helps to make a distinction between features that are
incompatible at the source code level, and those with incompatible
execution models.

The encouraging thing is that it's hard to think of examples of the
latter. Probably some flavors of concurrency cannot be mixed with
tractable semantics. Synchronous and asynchronous code are probably hard
to interface due to a different model of time, but synchronous
programming is rare outside embedded systems.

> My understanding of your position is that you work your way around this issue by creating different “worlds” for example parts of the program that are statically typed and others that are dynamically typed.

Exactly!

The words "language" and "program" occlude thinking in CS. Lisp and
Smalltalk show that you can just have a big world. Code enters and exits
this world as time goes on. You can divide it mentally into different
"programs" as a convention, but you don't have to, and often shouldn't.

ST and Lisp systems have one world with the same language semantics
across all of it. That's a clean design, but doesn't scale into large
systems where human nature and organizational inertia dictate that
different languages have to coexist. Conventional operating systems
solve the problem by putting each language in its own walled garden
("process"). That achieves clean coexistence of languages but throws
away the benefits of Lisp/ST where parts of the world can access all
other parts at a fine granularity, using a common conduit. (In Lisp,
classically the procedure call - in ST, the message send.)

We often implement one language on top of another. But there tends to be
a mismatch since the host language was designed as an opinionated
language that's good for writing applications and not as an
un-opinionated toolkit of language constructs. Lisp is a much better
foundation for language implementation than most languages since it's
dynamic and doesn't stray far from the lambda calculus (this is much of
the point of Racket). But we can do better. The recent work on types has
demonstrated that we can think of the fabric of a language as a
structure with mathematical properties, and strive to re-define that
structure in ways that give nicer properties for reasoning while
supporting similar code as before. This mindset should be extended
beyond the use of types in source code, and there is indeed a long
tradition of formalizing runtime semantics. It's just not unified yet.

Type theorists are currently making formal models for individual
languages. But assume a system like the JVM or Racket that mixes a dozen
languages. In this case, a unified mathematical structure is much more
tractable for machine processing (not to mention, simpler to implement
with less code) than a dozen separate structures with "ad hoc,
informally specified, bug-ridden" connections between them. That's a
quote from Greenspun's Tenth Rule. At the moment, we are all
greenspunning ad hoc FFI's between languages, just as people were and
are greenspunning ad hoc interpreters.

Take a look at the current and past projects in our field which do
multi-language virtual machines, or machine translation of code between
languages. They demonstrate that these problems are intractable to solve
cleanly without a unified model. The pile of special cases is too big to
keep under intellectual control.

The status quo is that each language is a semantic island. It would be a
huge improvement to have e.g. five competing unified models of varying
sophistication and comprehensiveness. From this competition, a standard
model should eventually emerge, as it has in other scientific fields.
There will always be some cutting-edge things that don't fit into the
model, but the 80% case is solved handily.

A standard model of programming languages is a giant effort that takes
maybe 100 committed researchers or more. (IIRC the GraalVM or Racket
team is about 50, give or take, with some people more committed than
others.) But here, as with other problems, Lisp's expressiveness gives
us an advantage, and we should do our part.

So the Racket language toolkit approach is right. But they're saying:
"Hey, you can come in and make your own new language!" Instead they
should be saying, "We made your existing languages run better!" At the
moment they can't say that, because they concentrate so much on
individual languages as silos instead of knitting them out of the same
fabric. An obvious starting point would be, how to make Racket a great
implementation of Common Lisp or Standard ML. Those languages are close
to Scheme and are wonderfully well specified. If they can do that, they
can branch out to more exotic or more poorly specified languages.

Once a few existing languages run well using a common framework, the
obvious realization falls out of the design that you can make new
languages by taking an existing one and tweaking a few things. E.g. take
call/cc out of Scheme, or add it to Clojure. It becomes apparent that a
language is just a gatekeeper to an ocean of computational features.
Maybe your favorite language is a wise gatekeeper, maybe not. In any
case, languages become an issue of aesthetics, not theory. Programmers
who are oriented toward abstract math should make the theory solid, and
programmers with a literary bent should make pretty conventions
("languages") out of the theorists' work.

I suspect that today's "languages" will be seen as clumsy and
inconvenient instruments in the future. This can already be seen in the
way Lisp coders look at Java. Lisp has macros, so you can configure the
syntax, and Lisp coders go "You can't even add a new kind of loop to
Java!" There's no reason this malleability has to be restricted to
syntax. It should concern the semantics too. In a big program, it pays
to use different computational and typing features to express different
parts. Ada already lets you restrict parts of a program to a tailor made
subset of the language, and people say this is a great help. PLT has
isolated teaching subsets out of Scheme, also to good effect. This is
the future.

> But then isn’t that just saying that you accept the existence of different languages (i.e. these worlds) and what you are looking for is an “as seamless as possible” FFI for these languages to talk to each other, like nodes of a distributed system communicating with each other?>

Exactly. The FFI and distributed metaphors are good ones.

Concretely, gambit.h has the following:

/*
* Subtype tag assignment.
*
* These tags are stored in the head of memory allocated objects
* (including pairs).
*
* ___SB = number of subtype tag bits
* ___sVECTOR = tag for vectors
* ___sPAIR = tag for pairs
* ___sRATNUM = tag for ratnums
* ___sCPXNUM = tag for cpxnums
* ___sSTRUCTURE = tag for structures
* ___sBOXVALUES = tag for box and multiple-values objects
* ___sMEROON = tag for Meroon objects
* ___sJAZZ = tag for Jazz objects
* ___sSYMBOL = tag for symbols
* ___sKEYWORD = tag for keywords
* ___sFRAME = tag for continuation frames
* ___sCONTINUATION = tag for continuations
* ___sPROMISE = tag for promises
* ___sWEAK = tag for weak objects (wills and GC hash tables)
* ___sPROCEDURE = tag for procedures
* ___sRETURN = tag for returns
* ___sFOREIGN = tag for foreign data
* ___sSTRING = tag for strings
* ___sS8VECTOR = tag for 8-bit signed integer vectors
* ___sU8VECTOR = tag for 8-bit unsigned integer vectors
* ___sS16VECTOR = tag for 16-bit signed integer vectors
* ___sU16VECTOR = tag for 16-bit unsigned integer vectors
* ___sS32VECTOR = tag for 32-bit signed integer vectors
* ___sU32VECTOR = tag for 32-bit unsigned integer vectors
* ___sF32VECTOR = tag for 32-bit floating point number vectors
* ___sS64VECTOR = tag for 64-bit signed integer vectors
* ___sU64VECTOR = tag for 64-bit unsigned integer vectors
* ___sF64VECTOR = tag for 64-bit floating point number vectors
* ___sFLONUM = tag for flonums
* ___sBIGNUM = tag for bignums

So in addition to standard Scheme, you already have hooks in the runtime
for Meroon and JazzScheme objects. It would be natural to add more.

But what we should look for is FFI between computational features, not
between entire languages.

> I think it will be hard to be seamless and this will cause users to pick a world and do their programming there, which is what is being done now.

Sure. Programmers will always opt to choose a pre-made world as a
starting point. (Advanced programmers will perhaps modify it to suit.
E.g. on a team of coders, the team lead could configure the language.)

Making things seamless is indeed extremely hard. This is to be expected
when any field makes a "standard model" out of its theories. Concretely,
you'd e.g. out the many idiosyncratic behaviors of Perl or JavaScript
and replace them with something coherent. Perhaps you'd make "adapters"
so that the old quirks are emulated faithfully, and people with legacy
codebases would opt to use this "quirks mode".

But most code doesn't really depend on a specific language and its
quirks. It expresses an idea which requires a few computational
concepts, and can be ported to a cleaner version of the same concepts.

> I can’t help but make the link with my work on different FFIs between Gambit Scheme and other languages, namely JavaScript[1] and Python[2]. In those FFIs we have chosen to embed the JavaScript and Python syntax in Scheme so that it is as seamless as possible to use those languages from Scheme (note that all three are dynamically typed which simplifies things). I reject the idea that expressing everything with parentheses is “better” even though I am a great fan of s-expressions for all the reasons we know. Sometimes a different syntax is useful to indicate which of these worlds is running the code so that our mind doesn’t get confused. Of course, in our work, there is a mapping from infix expressions to s-expressions that simplifies processing the foreign expressions, possibly using them in pattern matching or macros, but the programmer is not working with this directly. A quick example:

The future of S-expressions is to serve as an on-disk storage format for
code and data. The user interface should be fully themeable. A subset of
programmers are good at aesthetics and enjoy making things like UI
themes. They should be given a way to make window dressing for
S-expressions so that we can have nicer ways to read and edit code
without over-burdening the textual syntax. (Observe the effort to add
new syntax to languages like C++, Perl, and Haskell. It's an obvious
dead end because they insist on using text.)