On Wed, Mar 6, 2019 at 1:53 PM Lassi Kortela <xxxxxx@lassi.io> wrote: > What does everyone you think? I would prefer to try this first because > it's an established standard with readily available tools. All the > while being simple and familiar (they just use HTML with light > additions, much like we've been doing so far, and specify a standard > conversion formula to JSON). This would also settle our debate about > how to use class attributes, since they specify that :-P > > For example, here's how one might mark up a procedure definition: > > <p class="h-proc-def"> > <b>Procedure: </b> > <code> > <a class="p-name" name="make-array">make-array</a> > <var class="h-arg">interval</var> > <var class="h-arg">getter</var> > [ <var class="h-arg"><span > class="p-type">optional</span>setter</var> ] > </code> > </p> (Without upsetting anyone) I really think that this is the best example on how to fail at this endeavor. It is so complex: * in order to identify that `make-array` is actually a procedure definition, we have to look for an element that has the class `h-proc-def`, which should contain (somewhere not necessarily in direct children) an element with the class `p-name` whose attribute is the actual "name" of the procedure; just trying to think about expressing this in code, especially with XML libraries or XSLT scares me... (for a second try just try to imagine how the code to extract arguments looks like;) * it provides too much overhead: it has too much duplication, the `make-array` token appears twice; * it fails to capture all signature elements: what is the output of the procedure? what are the types of various arguments? When designing this format think about how one could use `pup` / `jq` to extract the data. My proposal is to keep things simple: * for indexing just using `<a class="proc-def">make-array</a>` is enough; * for actual signatures I think an S-expression based description is better (however see the other paragraph where I note that perhaps this is too much for SRFI's); for example: https://github.com/volution/vonuvoli-scheme/blob/development/documentation/libraries-r7rs.ss#L6373 (make-vector (type constructor) (export scheme:base) (signature ((range-length-zero) -> vector-empty) ((range-length-zero any) -> vector-empty) ((range-length-not-zero) -> vector-not-empty) ((range-length-not-zero any) -> vector-not-empty)) ... I've tried hard to think about this problem (when I did my R7RS documentation conversion) and came to the conclusion that one can't expect to extract accurate information from "text" documents without making a mess out of them. Then I came to the conclusion that just "back-referencing" things in the actual text, and providing "external" structured syntax / signatures is the best approach. Take the example of `cond`: https://github.com/volution/vonuvoli-scheme/blob/development/documentation/libraries-r7rs.ss#L2932 I start with the structured syntax signature, and then in the description, formatted as CommonMark I could use CommonMark references to "special" tags to link to other elements. (I haven't included one in `cond`'s description, but I have technical support for it in the parser / formatter.) However this is perhaps too-much for the SRFI use-case. Instead I think just having a few "markers" to allow indexing / back-referencing, then a simplified / standard structure (sections, paragraphs, lists, code snippets, etc.) is enough. Based on this one can take the (X)HTML and "render" it as CommonMark / other formats to be included in his own documentation. > > I'm still not convinced of the need to go all the way to XHTML. > > > If there's a reliable way to convert from HTML to XHTML, then > > there's no need for XHTML to be the on-disk format. > > XHTML is definitely not necessary for indexing, but it may be > necessary/much easier for the full-text conversions Ciprian would like > to do. I guess the decision rests on how easy it is to automate? I agree that XHTML is not strictly necessary, but as highlighted above it would help us from a technical point of view, especially to convert it to other formats to be included in other documentations. Ciprian.