Re: In-source documentation pre-SRFI draft #2

Show/hide message thread
In-source documentation pre-SRFI draft #2 Antero Mejr (28 May 2024 05:35 UTC)
Re: In-source documentation pre-SRFI draft #2 Wolfgang Corcoran-Mathe (28 May 2024 23:13 UTC)
Re: In-source documentation pre-SRFI draft #2 Antero Mejr (29 May 2024 00:12 UTC)
Re: In-source documentation pre-SRFI draft #2 Arthur A. Gleckler (29 May 2024 01:51 UTC)
Re: In-source documentation pre-SRFI draft #2 Antero Mejr (29 May 2024 02:10 UTC)
Re: In-source documentation pre-SRFI draft #2 Arthur A. Gleckler (29 May 2024 02:16 UTC)
Re: In-source documentation pre-SRFI draft #2 Philip McGrath (29 May 2024 19:19 UTC)
Re: In-source documentation pre-SRFI draft #2 Antero Mejr (29 May 2024 23:37 UTC)
Re: In-source documentation pre-SRFI draft #2 Antero Mejr (29 May 2024 23:41 UTC)
Re: In-source documentation pre-SRFI draft #2 Wolfgang Corcoran-Mathe (30 May 2024 03:02 UTC)
Re: In-source documentation pre-SRFI draft #2 Philip McGrath 29 May 2024 19:19 UTC
Hi,

I have a few thoughts and concerns from a fairly brief reading of the
draft and discussion.

There's a lot about documentation that's a matter of taste, so a word on
where I'm coming from: I'm primarily a Racketeer, and I mostly write
Scribble documentation out-of-source, though I have also used
`scribble/srcdoc`. I write some Guile documented with its built-in
plaintext docstrings and, for Guix, Texinfo fragments embedded in
strings and a separate Texinfo manual. I've also contributed to the Chez
Scheme User's Guide, which is written in stex [1], a subset of LaTeX
with embedded Scheme that is preprocessed to plain LaTeX and rendered
either to PDF by a normal TeX implementation or to HTML by one of the
stex tools. Of these, Scribble is by far my preferred system.

[1]: https://github.com/dybvig/stex

Maybe I should also mention that I'm primarily familiar with Racket and
R6RS, and have had to check R7RS Small to see what is missing.

That perhaps leads to my broadest question about the goals of this pre-SRFI:

On 5/28/24 20:12, Antero Mejr wrote:
 > Wolfgang Corcoran-Mathe <xxxxxx@sigwinch.xyz> writes:
 >
 >> I had difficulty with the notion of “attached” documentation as
 >> you’ve described it. What does it mean for ‘make-documentation’
 >> to take “an expression” as its *content* argument? In your example,
 >> you pass it the value ‘(+ 3 2), but the idea, of course, is to attach
 >> documentation to expressions, not values. (In other words, shouldn’t
 >> ‘documentation-object’ return a syntax object?) Maybe I’m missing
 >> something about how you’re expected to invoke these procedures.
 >
 > The idea is to call (read-doc) to extract documentation from a
 > file/port, similar to how a REPL would call (read). When there are no
 > documentation comments, (read-doc) is the same as (read). When there
 > are, instead of returning an expression/object, (read-doc) returns a
 > documentation record, which wraps the expression/object along with the
 > text of the associated (or "attached") documentation comment. The
 > documentation comment must be "adjacent" to the expression/object in the
 > source code. Typically documentation comments would be placed
 > immediately before the expression, but I left the interpretation of
 > "adjacent" up to the implementation, based on feedback I received
 > earlier.
 >
 > The documentation-content field is an unevaluated expression or readable
 > Scheme object, since R7RS-small does not have #'() syntax objects.
 >

What is the line between what this SRFI wants to make portable, and what
is to be left to the implementation?

In my view, the most useful thing would be to let Schemers write
portable Scheme libraries with portable in-source documentation. But
some aspects of this proposal seem to go in sort of the opposite
direction: it specifies an API for documentation *tools* portable to
many Scheme implementations, rather than focusing on making the
*documentation* portable to many tools, including tools that might be
integrated into a Scheme implementation.

Leaving the interpretation of "adjacent" up to the implementation makes
documentation non-portable: if I write `(foo #|*doc*|# bar)`, I need to
know, portably, whether I am documenting foo or bar. More realistically,
the example:

> ```
> (define (add-one x)
>   #|* Add one to x. *|#
>   (+ x 1))
> ```

is not portable if it is treated as documentation for `add-one` by some
implementations but for `(+ x 1)` by others.

There is a similar issue with the `documentation-format` parameter
(though it doesn't seem to actually be used): the format documentation
is written in is a property of the written documentation, not of the
tool configuration.
See [2] for a real (albeit small) problem caused by `the supplied
<command> is` being parsed as plaintext in some contexts but markdown in
others.

[2]: https://github.com/cisco/ChezScheme/pull/816

In the other direction, I see little value in standardizing
`read-documentation` or `read-documentation`, especially if they must be
constrained to the lowest common denominator of R7RS Small without
syntax objects. Tools seem very likely to want to use
implementation-specific features, such as Chez Scheme's "annotations"
[3] or Racket's syntax properties [4] and `srcloc` values [5]. Many of
Scribble's features, particularly automatic cross-reference links, use
the full lexical context information in Racket syntax objects, so that,
if you write `(define one 1)` in a code fragment, the identifier
`define` can be linked to the documentation for the applicable
definition of `define`. (There are currently 32 documented definitions
of `define` in the Racket package catalog.) But library writers who want
to write portable documentation don't need to care what API implementers
of documentation tools use to process the documentation that they've
written.

[3]: https://cisco.github.io/ChezScheme/csug10.0/syntax.html#./syntax:h12
[4]: https://docs.racket-lang.org/reference/stxprops.html
[5]:
https://docs.racket-lang.org/reference/exns.html#%28def._%28%28lib._racket%2Fprivate%2Fbase..rkt%29._srcloc%29%29

I also have some concerns about the parts of the draft that do deal with
writing, as opposed to processing, documentation.

I am not aware of any existing documentation system that documents
"expressions", and I am confused about the intent. (I also note that, in
a language with macros, a `read`-based approach that doesn't perform
macro expansion is really documenting source S-expressions, not
"expressions": macro expansion is needed to differentiate expressions
per se from definitions, binding occurrences of variables, uses of
syntax pattern variables with incorrect ellipsis depth, and other
non-expressions.) Most documentation systems seem to document libraries
and their exports: named functions, macros, structure types, and other
values.

There is no portable way to "import" extensions to lexical syntax, so
they don't seem to me like a good solution in this case. A "magic"
comment syntax avoids some of these problems, but not all, and, if
enabled by default, can cause problems for source code that didn't
expect the "magic", but expected all comments to be pure comments. While
this is definitely a secondary concern, "magic" comments also require
tool implementers to adjust their reader, at least in some modes, to
avoid discarding comments.

As a "lowest common denominator", plain text (as comments or Scheme
strings) definitely has a role to play in documentation. However, one of
the major contributions from Scheme and the broader Lisp family of
languages is rich support for creating embedded domain-specific
languages. For documentation to *only* support character-based DSLs,
whether in strings or "magic" comments, that must be parsed by external
tools, seems to me like a weakness/restriction that ought to be removed.

The docstring approach:

On 5/25/24 07:35, Jakub T. Jankiewicz - jcubic at onet.pl (via
srfi-discuss list) wrote:
 >
 > (define (foo x y)
 >    "(foo x y)
 >
 >     x - this is a number
 >     y - this is a string"
 >     (+ x x))

avoids the problems of "magic" comments, and it has the advantage of
simplicity for plain text. I think a more promising route would be to
allow syntactic extension. For example, a simple implementation might allow:

```
(define-syntax markdown-doc
   (syntax-rules ()
     ((_ str)
      str)))
(define (add x y)
   (markdown-doc "Applies `+` to its two arguments.")
   (+ x y))
```

and get the equivalent of a docstring by expanding to a string literal,
while a different implementation might parse the markdown at a compile
time and expand to implementation-defined primitive forms for
documentation literals.

There are two other resources I recommend to anyone thinking about this
design space.

Matthew Flatt's "Submodules in Racket: You Want it When, Again?" [6]
presents a perspective on documentation as a kind of orthogonal extra
dimension of phase levels. R6RS and R7RS have a more restricted model of
phase levels (e.g. identifiers may not have different bindings at
different phase levels in the same module), but there may be some ideas
that could enhance the R7RS Large approach contemplated here:

On 5/25/24 07:06, Daphne Preston-Kendal wrote:
 > The vague plan for R7RS Large is this:
 >
 > Documentation will be attached as an identifier property (a la SRFI
 > 213).
 > This means everything can be documented – not only procedures but
 > macros, variables, record types and associated procedures, etc. etc.
 >
 > There is an implementation to show how simple this idea could be at
 > <https://codeberg.org/scheme/r7rs/wiki/Identifier-property-use-cases>.
 > (This is only a sketch.)

[6]: https://www-old.cs.utah.edu/plt/publications/gpce13-f-color.pdf

(Less directly about documentation, for someone exploring those ideas
I'd also recommend Ballantyne, King, and Felleisen's "Macros for
Domain-Specific Languages" [7] and some of the experimental work by the
Rhombus group, e.g. binding "spaces" and "portal" syntax.)

[7]: https://dl.acm.org/doi/pdf/10.1145/3428297

Eli Barzilay's @-expressions are a very powerful extension to the
lexical syntax of S-expressions for working with textual content. They
are most prominently used by Scribble, but they are completely general:
they are just an alternative concrete syntax for S-expressions. Eli's
"The Scribble Reader: An Alternative to S-expressions for Textual
Content" [8] is the most focused on the design and motivation of the
concrete syntax, though parts of Flatt, Barzilay, and Findler's
"Scribble: Closing the Book on Ad Hoc Documentation Tools" [9] and the
reference manual [10] may also be useful. As Eli writes, "Generally
speaking, the Scribble reader does not require any features specific to
PLT Scheme. In fact, we hope that other implementors will consider doing
so, which will provide their implementations with the same benefits, as
well as benefit the whole Scheme community."

[8]: https://www2.ccs.neu.edu/racket/pubs/scheme2009-b.pdf
[9]: https://www2.ccs.neu.edu/racket/pubs/icfp09-fbf.pdf
[10]: https://docs.racket-lang.org/scribble/reader.html

Hope some of this is useful!
Philip McGrath