Re: Unreadable Objects: current status and where to go

Show/hide message thread

Unreadable Objects: current status and where to go Lassi Kortela (09 Dec 2022 17:12 UTC)

Re: Unreadable Objects: current status and where to go Marc Nieper-Wißkirchen (09 Dec 2022 17:30 UTC)

Re: Unreadable Objects: current status and where to go Lassi Kortela (09 Dec 2022 17:53 UTC)

Re: Unreadable Objects: current status and where to go Lassi Kortela (09 Dec 2022 18:09 UTC)

Re: Unreadable Objects: current status and where to go Marc Nieper-Wißkirchen (09 Dec 2022 18:33 UTC)

Re: Unreadable Objects: current status and where to go Lassi Kortela (09 Dec 2022 19:13 UTC)

Re: Unreadable Objects: current status and where to go Marc Nieper-Wißkirchen (09 Dec 2022 19:20 UTC)

Re: Unreadable Objects: current status and where to go Lassi Kortela (09 Dec 2022 19:38 UTC)

Re: Unreadable Objects: current status and where to go Arthur A. Gleckler (09 Dec 2022 20:14 UTC)

Re: Unreadable Objects: current status and where to go Marc Nieper-Wißkirchen (09 Dec 2022 20:42 UTC)

Re: Unreadable Objects: current status and where to go Lassi Kortela (10 Dec 2022 13:35 UTC)

Re: Unreadable Objects: current status and where to go John Cowan (09 Dec 2022 19:22 UTC)

Re: Unreadable Objects: current status and where to go Marc Feeley (09 Dec 2022 19:01 UTC)

Re: Unreadable Objects: current status and where to go Lassi Kortela 09 Dec 2022 19:13 UTC

>> It might make sense to specify a procedure,
>>
>> (write-unreadable-object object)
>>
>> that writes `object` using the customary notation of the Scheme
>> implementation.

> So OBJECT here is a Scheme datum value that must not be confused with
> the unwritable object itself, right?

OBJECT is written as a stand-in for the original object. The
`write-unreadable-object` procedure has no knowledge of the original object.

E.g. (write-unreadable-object '(output port string))

It would probably be better to call it just `write-unreadable`.

> I am wondering what the actual meaning of the OBJECT as a Scheme would
> be.  For example, in Chez, (open-output-string) prints as #<output
> port string>, which would probably become #?(output port string) in
> your proposal.  But interpreting the string "output port string" as a
> list of three symbols is not meaningful, is it?  Or should it be
> #?"output port string"?

It could be written as one of

#?(output port string)

#?"output port string"

I would discourage the latter since, as you say, it has no structure.

I think the reasonable formats are:

#?type-identifier
#?(type-identifier more-stuff ...)

Where known type-identifiers are tracked in a table at
https://registry.scheme.org/. The table could also say how to parse the
more-stuff that goes with each type-identifier.

and #?type-identifier is equivalent to #?(type-identifier).

The prior art in the Examples section of SRFI 243 makes it clear that
the usual format is #<identifier (followed by other stuff)>. This is
straightforward to change to #?(identifier (followed by other stuff)).
After all, the old #<...> syntax wasn't readable, so no information is
lost by changing it.

In the Chez case you ask about, they could change it to something like:

#?(port output string)
#?(output-port string)
#?(string-output-port)

It's probably futile to try to figure out whether something like the
existing #<output port string> should be interpreted as symbols or a
string, since code that writes things like this is likely to use an ad
hoc mix of `display` and `write` with little forethought.

>>> If you move to #?, the focus of SRFI 243 seems to change considerably.
>>> Could you write down the intended normative part in a few sentences of
>>> such a version of SRFI 243?

>> Change the RnRS grammar so <compound datum> includes <unreadable>
>>
>> where <unreadable> == "#?" <datum>

> You cannot change <compound datum> because this can be a <datum>, and
> this is what the read procedure successfully parses (see 7.1.2 of the
> R7RS).

That would require adding an exception to the relevant passage in RnRS.

The #? syntax should introduce something that (read) can successfully
parse into an object, but (read) should not return that object.

>> When `(read)` reads <unreadable>, it raises an error (we could specify a
>> standard subtype of read-error for this purpose, call it e.g.
>> unreadable-object-error).
> If this is all that we want, then it will be enough to specify a new
> token #? whose semantics is the reader must signal an error when it
> reads it.  Or do you want that #?(foo bar) signals an error, but
> #?(foo . . baz) not necessarily?

If #?(foo . . baz) is allowed then we don't gain anything over #<...>.
The point of #?datum is that the datum is structured, so you can analyze
or skip it.

Conceptually, the error signaled by #?(foo bar) is finer-grained (i.e. a
subtype of) the generic read-error signaled by #?(foo . . baz).

> I should also mention an argument that speaks against inventing a new
> #? syntax:  The #-lexical syntax-namespace is small and precious, so
> effectively doubling the syntax from #< to #< and #? for reader errors
> may be a bit costly.

I'm very sympathetic to that argument, and often deploy it myself.

But unreadable objects are something fundamental, and the #<...>
notation is fundamentally broken (which I didn't notice without your
help; I've been looking at it for 10+ years so I had grown so accustomed
to it that I didn't take it apart mentally!)