On Wed, Mar 6, 2019 at 1:53 PM Lassi Kortela <xxxxxx@lassi.io> wrote:
> What does everyone you think? I would prefer to try this first because
> it's an established standard with readily available tools. All the
> while being simple and familiar (they just use HTML with light
> additions, much like we've been doing so far, and specify a standard
> conversion formula to JSON). This would also settle our debate about
> how to use class attributes, since they specify that :-P
>
> For example, here's how one might mark up a procedure definition:
>
> <p class="h-proc-def">
> <b>Procedure: </b>
> <code>
> <a class="p-name" name="make-array">make-array</a>
> <var class="h-arg">interval</var>
> <var class="h-arg">getter</var>
> [ <var class="h-arg"><span
> class="p-type">optional</span>setter</var> ]
> </code>
> </p>
(Without upsetting anyone) I really think that this is the best
example on how to fail at this endeavor.
It is so complex:
* in order to identify that `make-array` is actually a procedure
definition, we have to look for an element that has the class
`h-proc-def`, which should contain (somewhere not necessarily in
direct children) an element with the class `p-name` whose attribute is
the actual "name" of the procedure; just trying to think about
expressing this in code, especially with XML libraries or XSLT scares
me... (for a second try just try to imagine how the code to extract
arguments looks like;)
* it provides too much overhead: it has too much duplication, the
`make-array` token appears twice;
* it fails to capture all signature elements: what is the output of
the procedure? what are the types of various arguments?
When designing this format think about how one could use `pup` / `jq`
to extract the data.
My proposal is to keep things simple:
* for indexing just using `<a class="proc-def">make-array</a>` is enough;
* for actual signatures I think an S-expression based description is
better (however see the other paragraph where I note that perhaps this
is too much for SRFI's); for example:
https://github.com/volution/vonuvoli-scheme/blob/development/documentation/libraries-r7rs.ss#L6373
(make-vector
(type constructor)
(export scheme:base)
(signature
((range-length-zero) -> vector-empty)
((range-length-zero any) -> vector-empty)
((range-length-not-zero) -> vector-not-empty)
((range-length-not-zero any) -> vector-not-empty))
...
I've tried hard to think about this problem (when I did my R7RS
documentation conversion) and came to the conclusion that one can't
expect to extract accurate information from "text" documents without
making a mess out of them. Then I came to the conclusion that just
"back-referencing" things in the actual text, and providing "external"
structured syntax / signatures is the best approach.
Take the example of `cond`:
https://github.com/volution/vonuvoli-scheme/blob/development/documentation/libraries-r7rs.ss#L2932
I start with the structured syntax signature, and then in the
description, formatted as CommonMark I could use CommonMark references
to "special" tags to link to other elements. (I haven't included one
in `cond`'s description, but I have technical support for it in the
parser / formatter.)
However this is perhaps too-much for the SRFI use-case. Instead I
think just having a few "markers" to allow indexing /
back-referencing, then a simplified / standard structure (sections,
paragraphs, lists, code snippets, etc.) is enough. Based on this one
can take the (X)HTML and "render" it as CommonMark / other formats to
be included in his own documentation.
> > I'm still not convinced of the need to go all the way to XHTML.
>
> > If there's a reliable way to convert from HTML to XHTML, then
> > there's no need for XHTML to be the on-disk format.
>
> XHTML is definitely not necessary for indexing, but it may be
> necessary/much easier for the full-text conversions Ciprian would like
> to do. I guess the decision rests on how easy it is to automate?
I agree that XHTML is not strictly necessary, but as highlighted above
it would help us from a technical point of view, especially to convert
it to other formats to be included in other documentations.
Ciprian.