I continue to be impressed with pup.  Per Bothner's SRFI 164 already has a bunch of appropriate markup in it, so I've been experimenting with it and pup.  For example, here I extract the names of all the procedures defined in that SRFI:

> SRFI=164; cat $ss/srfi-$SRFI/srfi-$SRFI.html | pup '[kind="Procedure"] .proc-def text{}'
->shape
shape
array-shape
array-rank
array-start
array-end
array-size
array
make-array
make-array
build-array
index-array
array-ref
array-ref
array-index-ref
array-set!
array-set!
array-copy!
array-fill!
array-transform
array-index-share
array-reshape
share-array
array-flatten
array->vector

Using pup with jq, a command-line JSON querying tool, you can extract even more information.  Here's a crude query that produces the first few procedures that are defined, along with their arguments:

> SRFI=164; cat $ss/srfi-$SRFI/srfi-$SRFI.html | pup '.synopsis json{}'|jq 'map(.children[]|{name:select(.class=="function").text}[],{argument:select(.tag=="var").text})[0:13]'
[
  "(array?",
  {
    "argument": "obj"
  },
  "(range-from",
  {
    "argument": "start"
  },
  {
    "argument": "step"
  },
  "(range<",
  {
    "argument": "end"
  },
  {
    "argument": "start"
  },
  {
    "argument": "step"
  },
  "(range<=",
  {
    "argument": "end"
  },
  {
    "argument": "start"
  },
  {
    "argument": "step"
  }
]

This is just a proof of concept.  I'm not arguing that we should use pup or jq, or that Per's specific markup is the correct one — only that, even with a simple convention like Per is using in his SRFI, it's already possible to extract useful information.  We really shouldn't have to do much to encode useful information even in basic HTML.