On 25/09/2019 09:25, Lassi Kortela wrote: > It might be nice if the hash syntax is always #symbol{...} which is a > mash-up of all our suggestions so far. And if the symbol could > optionally be reverse-DNS, so I could make #io.lassi.whizbang{...} for > my own syntax extensions without disturbing others, and the {...} > wrapping would guarantee an easy way for uninterested (machine and > human) readers to skip it. Ah, yes, I forgot to add a private extension mechanism when talking about having an SRFI-driven registry - I'm getting sloppy! I think if we want to allow uninterested readers to skip, we need to decide how much to constrain the power of custom parsers for custom types, lest finding the end of the expression turn out to be arbitrarily complicated! This could either be: 1) Mandate that the syntax of the stuff in {} is actually the same as a normal sexpr list () - a space-delimited list of sexprs. 2) Make it an arbitrary string, but with rules about how embedded }s are represented - some combination of quoting mechanisms and saying that balanced pairs of {} are OK. I prefer the former. The latter might be better for really complex grammars, if they can be bent to work with the mandated quoting rules - I'm struggling to think of good examples, perhaps #xml{Hello <em>world</em>!} would be handy for people writing Web applications. But the former can do something nearly as good: #xml{"Hello <em>world</em>!"}. >> As for bytevectors, I'll add them using R7RS syntax. Should we require >> hex? It's easier to read, if wasteful. Chibi outputs every value in hex >> except 0, which is exceedingly common. > > Do you mean this: #u8(1 2 3 4 5). I find it very nice. However, the bit > density is low and as you say, decimal numbers cause concern. I dimly recall being impressed by Erlang's binary literal syntax, let me do some research and try to remember why... https://forfunand.wordpress.com/2011/10/10/why-erlangs-binary-syntax-is-awesome/ ...Ok, it's because the syntax allows for a patter matching form with embedded variables which is really neat for deconstructing and constructing fixed-width binary network packets and the like, not relevant here! I guess the options are: 1) Hexadecimal 2) Strings with quoting for unprintable/delimiter characters 3) base64 or similar 4) Some hybrid, where the contents of a bytevector literal are a series of lexically distinguishable segments that could use any of the above, and are concatenated together. Might even drop the quoting in strings and force any non-printable or delimter characters to drop out of string mode and go in hex. #u8("This is a null-terminated string" 0). #u8("This is an embedded block of really random entropy: " :GSVGxo89Ab6QX4D8l9KWzQ== " - I hope you like it.") - where I have purely arbitrarily chosen ':' as a prefix to base64 values to distinguish them from hex values. (2) makes bits of a bytevector that happen to also be valid ascii or utf-8 text "readable", but is more complicated to generate/parse and ends up as a worse form of (1) for very unprintable stuff. (3) is dense and simple to process for machines, but is totally meaningless to humans. (1) is like a watered down form of (3), a little easier for humans to make some sense of (as a kid hacking around under MS-DOS, I quickly learnt to read some parts of x86 machine code in hex... the phrase "CD 21" is forever burned into my memory) (4) is probably optimal from a human readability perspective, IF the encoder makes smart choices about what encodings to use where, but is the most complicated to implement. If I had to pick one... I'd be torn between (3) because it's simple and compact and human readability of bytevectors isn't the biggest concern, and the original R7RS format because it'd be a shame to have yet another standard! -- Alaric Snell-Pym (M7KIT) http://www.snell-pym.org.uk/alaric/