> My proposal is to keep things simple: > > * for indexing just using `<a class="proc-def">make-array</a>` is enough; > * for actual signatures I think an S-expression based description is > better (however see the other paragraph where I note that perhaps this > is too much for SRFI's); for example: > > https://github.com/volution/vonuvoli-scheme/blob/development/documentation/libraries-r7rs.ss#L6373 > > (make-vector > (type constructor) > (export scheme:base) > (signature > ((range-length-zero) -> vector-empty) > ((range-length-zero any) -> vector-empty) > ((range-length-not-zero) -> vector-not-empty) > ((range-length-not-zero any) -> vector-not-empty)) > ... This is obviously far superior to any HTML-based approach, but would have to be maintained in a separate file from the SRFI HTML. I think our main point of disagreement is how much worse it is to have a separate file. I'm willing to live with clumsy HTML markup if we can have only one file per SRFI. You're willing to live with two files per SRFI if we can have great metadata syntax (S-expr or JSON). Correct? Could we get everyone's opinion on this issue, as it may be the biggest detriment to forming a consensus? Who would rather have somewhat clumsier markup but require only a single HTML file, and who would rather have separate HTML and metadata files if it means the HTML markup can be somewhat cleaner? For the purposes of this vote, metadata would include at least argument lists and one-line descriptions of all the procedures defined in the SRFI. If we are only interested in procedure names, those can easily be marked up in any number of unintrusive ways, so there wouldn't be much controversy. It's the nested data that brings the complexity. > It is so complex: > > * in order to identify that `make-array` is actually a procedure > definition, we have to look for an element that has the class > `h-proc-def`, which should contain (somewhere not necessarily in > direct children) an element with the class `p-name` whose attribute is > the actual "name" of the procedure; just trying to think about > expressing this in code, especially with XML libraries or XSLT scares > me... (for a second try just try to imagine how the code to extract > arguments looks like;) > > * it provides too much overhead: it has too much duplication, the > `make-array` token appears twice; > > * it fails to capture all signature elements: what is the output of > the procedure? what are the types of various arguments? > > When designing this format think about how one could use `pup` / `jq` > to extract the data. I think we all agree HTML classes are not an ideal way to represent information. But from my point of view the other options are even worse :) That's why I've been advocating classes. If I've understood correctly, Arthur and Per have a similar viewpoint. For example, I imagine most schemers would find S-expressions superior to XML in general. But we'd have to keep them in a separate file or parse them from the bodies of HTML tags. If the SRFIs were in Scribe format then clean S-expressions might be a no-brainer, but since HTML has already been established, they would add complexity. Likewise, having a separate sub-element for the name of a definition is not ideal, but I think all other approaches are even less ideal. > I've tried hard to think about this problem (when I did my R7RS > documentation conversion) and came to the conclusion that one can't > expect to extract accurate information from "text" documents without > making a mess out of them. That's probably true. Even the most compact HTML tags are verbose compared to ordinary uses of S-expressions or JSON. It's hard to do anything at all without resorting to <span class="x">y</span> or similar. Since it's almost impossible to represent any kind of nested data in an HTML attribute value without creating more problems than it solves, any layer of nesting requires new sub-tags. This is probably just something that has to be accepted if the HTML route is chosen. But basically, you already have to write the procedure names and argument names in the HTML, and they are often written in a special font. Hence there is already a tag around them. If the markup is e.g.: <div> <b>make-array</b> <var>interval</var> <var>getter</var> [ <var>setter</var> ] </div> Then it's not a big step to add classes: <div class="proc def"> <b class="name">make-array</b> <var class="arg">interval</var> <var class="arg">getter</var> [ <var class="opt arg">setter</var> ] </div> Or the equivalent microformat classes. I don't think there is any way to avoid verbosity because no matter what HTML-based approach we choose, we have to write <foo class="bar"> all over the place. Personally, I would argue that HTML is already quite verbose even without any metadata. Hence elegant HTML is something of a lost cause to begin with. We could also use 'id=' and HTML5 'data' attributes, e.g. <div class="proc def" id="make-array"> <div class="proc def" data-name="make-array"> but that's not necessarily simpler or more compatible. The procedure name has to be visible in the SRFI anyway, often with special styling. So we might as well re-use the visible text for the metadata. Also, the HTML spec says that id attributes have to be unique in the entire document. You can't have the same id on different elements even if those elements have different tags and classes. So I wouldn't use 'id' for any of our metadata. I also think invisible tags are unavoidable if we add significant metadata into the old SRFIs. Even with microformats (there was a mistake in the microformat example I posted: the <span>optional</span> should have been hidden). For new SRFIs conforming to a rigid HTML structure they can hopefully be avoided. Invisible tags can maybe be avoided with creative uses of data/id/rel/rev attributes, but that probably creates even more problems. Many tags can be made visible, but attributes are always invisible... > However this is perhaps too-much for the SRFI use-case. Instead I > think just having a few "markers" to allow indexing / > back-referencing, then a simplified / standard structure (sections, > paragraphs, lists, code snippets, etc.) is enough. Based on this one > can take the (X)HTML and "render" it as CommonMark / other formats to > be included in his own documentation. We may also disagree on what constitutes "few" or "many" markers :) > (Without upsetting anyone) I really think that this is the best > example on how to fail at this endeavor. No problem :) I'm not sensitive to criticism. Lassi