Unicode lambda
Lassi Kortela
(12 May 2019 10:19 UTC)
|
Re: Unicode lambda
Shiro Kawai
(12 May 2019 11:18 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 11:40 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 11:50 UTC)
|
Re: Unicode lambda
Shiro Kawai
(12 May 2019 12:06 UTC)
|
Re: Unicode lambda
Marc Nieper-Wißkirchen
(12 May 2019 12:11 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 12:23 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 13:23 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 13:46 UTC)
|
Re: Unicode lambda
John Cowan
(12 May 2019 14:20 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 14:38 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 14:55 UTC)
|
Re: Unicode lambda
John Cowan
(12 May 2019 15:00 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 15:20 UTC)
|
Re: Unicode lambda
Shiro Kawai
(12 May 2019 18:42 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 19:43 UTC)
|
Re: Unicode lambda
John Cowan
(12 May 2019 22:29 UTC)
|
Re: Unicode lambda
Shiro Kawai
(13 May 2019 10:48 UTC)
|
Re: Unicode lambda Lassi Kortela (14 May 2019 08:25 UTC)
|
Re: Unicode lambda
Marc Nieper-Wißkirchen
(14 May 2019 08:50 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 10:10 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 10:59 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 12:35 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 13:09 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 14:04 UTC)
|
Re: Unicode lambda
Shiro Kawai
(14 May 2019 19:18 UTC)
|
Re: Unicode lambda
Vincent Manis
(14 May 2019 22:01 UTC)
|
Re: Unicode lambda
Lassi Kortela
(20 May 2019 09:21 UTC)
|
Re: Unicode lambda
Marc Nieper-Wißkirchen
(21 Oct 2019 14:20 UTC)
|
Re: Unicode lambda
Shiro Kawai
(21 Oct 2019 17:19 UTC)
|
Re: Unicode lambda
John Cowan
(21 Oct 2019 17:39 UTC)
|
Re: Unicode lambda
Marc Nieper-Wißkirchen
(21 Oct 2019 18:43 UTC)
|
Re: Unicode lambda
John Cowan
(21 Oct 2019 23:27 UTC)
|
Encoding declarations
Lassi Kortela
(22 Oct 2019 08:39 UTC)
|
Re: Encoding declarations
John Cowan
(22 Oct 2019 20:52 UTC)
|
#! directives, general and specific
Lassi Kortela
(22 Oct 2019 09:11 UTC)
|
Re: #! directives, general and specific
John Cowan
(22 Oct 2019 20:27 UTC)
|
Re: #! directives, general and specific
Lassi Kortela
(22 Oct 2019 20:43 UTC)
|
Re: Unicode lambda
Marc Nieper-Wißkirchen
(13 May 2019 08:50 UTC)
|
Re: Unicode lambda
Lassi Kortela
(13 May 2019 10:27 UTC)
|
Re: Unicode lambda
Per Bothner
(12 May 2019 14:17 UTC)
|
Re: Unicode lambda
Peter
(12 May 2019 15:06 UTC)
|
> I agree with the peril of feature bloat. > On the other hand, the problem of using S-expression declare-file form > is that it conflates meta-information into contents. Suppose you have a > config file whose format is a sequence of arbitrary S-expressions. How do > you attach encoding declaration to it? Note that the config directive > may happen to > begin with a symbol 'declare-file', in application-specific semantics. You are absolutely correct. Starting a file with a particular Lisp form means that that form becomes part of the "schema" of any file format it touches and the schema parser has to take it into account (even if only to ignore it). (In Scheme/Lisp code, the "schema" would be extended by defining a macro.) That's the main drawback of this approach, and that's why it should be optional (with the default coding being UTF-8 or whatever the implementation default is). > READ should read arbitrary (syntactically valid) S-expressions, without > interpreting > its semantics, because interpretation is up to the caller of READ. Isn't reading an entire source file different from reading one S-expression though? When you read a source file you have to support shebang lines (#!) and encoding declarations. When you call READ, you can assume that the textual port already has the correct encoding and you don't need to worry about a shebang line. So read a source file = parse shebang line and encoding from a binary port + convert to textual port + call normal READ in a loop until EOF. > The encoding information, however, is tricker than > #!case-fold or #,(construuctor arg ...), since you can't read it as a > text until you know > the encoding. So even we adopt one of the existing reader syntax, > encoding recogniztion > will likely to be implemented separately from the existing reader syntax > handling mechanism. Exactly. It's a chicken-and-egg problem :) S-expressions look and feel different from magic comments but in this sense they are not. We could put an encoding tag in S-exprs, XML (as John had in that blog post - in fact, XML already has a standard way to write an encoding attribute in XML itself), JSON, CSV or anything else and we'd have this same basic problem we have with parsing comments - no better, no worse. The point of putting it in S-exprs is that since we use them for code anyway, we might as well extend what we already have. Similarly, if we were already using JSON we could declare the encoding in the JSON object (in fact we would have to, since JSON doesn't support comments). I think magic comments would be the better choice (for compatibility reasons) if there was a push to standardize the format of all magic comments across popular languages so they can be parsed robustly, but alas there is not. With S-exprs we at least have something principled. > The #!-identifier, and srfi-10 #,ctor syntax, specifically exist to > communicate with READ > out-of-band from the S-expressions. If we want to piggy-back with > existing mechanism, > we can use either one of them. This is true. With Marc's comment of float precision elsewhere in the thread, it might be good if #! was allowed to take a list instead of a symbol. In that case we could have #!(encoding euc-jp). The main problem with #! is that it can occur anywhere in the file, so if encoding comes from #! then it can change in the middle of the file. (This probably doesn't make sense for any practical purpose, so the reader could raise an error when it gets the second encoding tag in the same file.) I would perhaps avoid putting the encoding (and other things that are meant to affect an entire stream, never only one part of it) in #! because it "gives the wrong signal" to users about what is possible to do with it. With #!(encoding ...) one might also change the encoding in the middle of a REPL session, but I don't know if that makes sense either. A terminal is supposed to have the same encoding constantly. > Well, if there's not so much agreement on this, I don't see its worth to > standardize; we'll probably be able to stick with utf-8. Yes. With the spread of UTF-8, this flexibility seems vaguely like we would be creating more problems and unnecessary complexity for future implementors :) Maybe it's best to keep doing what we are doing now, not write any encoding SRFI, and wait a few years until UTF-8 has completely taken over and R8RS can mandate UTF-8 source code. The practical situation already is that most code and terminals are UTF-8 (or plain ASCII). I'll drop my suggestion. > communicate with READ out-of-band from the S-expressions. Just to clarify this point - I had thought of the declare-file form mainly for other purposes; the encoding is just one little thing that it could have. I probably presented my thoughts in a confusing way because the emphasis has been on encodings in this discussion. I would not specify a declare-file form if the _only_ thing it did was to give the encoding. Rather, it would be a mechanism for specifying many different kinds of things about a file (many of which we cannot yet anticipate, and that's the point - it would be useful to have a standard place where to declare things on a file scope with room for arbitrary extensions, many of which could be implementation-specific or even project-specific). Most such metadata is not really meant to be read out-of-band from the normal reader (the encoding declaration would be the only such thing I can think of). A declare-file form would be a valid S-expression in Scheme's normal syntax so the reader could just read it as normal (assuming the corresponding macro is defined). It #!(lists ...) are allowed then perhaps there could be an alternate version #!(declare-file ...) if a version that doesn't add a form to the READ results is wanted. In fact, it may be a good idea. The more ideas we throw around about all this reader stuff, the more I grow to like the proposed #!(list ...) read syntax :) We could specify it so that #!foo is equivalent to #!(foo). I think we should permit an arbitrary form inside it since it may be useful to have extensions and we already have a full reader at our disposal. There are probably some details that will cause problems if absolutely all Scheme syntax is permitted inside it but we can map those out.