Unicode lambda
Lassi Kortela
(12 May 2019 10:19 UTC)
|
Re: Unicode lambda
Shiro Kawai
(12 May 2019 11:18 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 11:40 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 11:50 UTC)
|
Re: Unicode lambda
Shiro Kawai
(12 May 2019 12:06 UTC)
|
Re: Unicode lambda
Marc Nieper-Wißkirchen
(12 May 2019 12:11 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 12:23 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 13:23 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 13:46 UTC)
|
Re: Unicode lambda
John Cowan
(12 May 2019 14:20 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 14:38 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 14:55 UTC)
|
Re: Unicode lambda
John Cowan
(12 May 2019 15:00 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 15:20 UTC)
|
Re: Unicode lambda
Shiro Kawai
(12 May 2019 18:42 UTC)
|
Re: Unicode lambda
Lassi Kortela
(12 May 2019 19:43 UTC)
|
Re: Unicode lambda
John Cowan
(12 May 2019 22:29 UTC)
|
Re: Unicode lambda
Shiro Kawai
(13 May 2019 10:48 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 08:25 UTC)
|
Re: Unicode lambda Marc Nieper-Wißkirchen (14 May 2019 08:50 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 10:10 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 10:59 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 12:35 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 13:09 UTC)
|
Re: Unicode lambda
Lassi Kortela
(14 May 2019 14:04 UTC)
|
Re: Unicode lambda
Shiro Kawai
(14 May 2019 19:18 UTC)
|
Re: Unicode lambda
Vincent Manis
(14 May 2019 22:01 UTC)
|
Re: Unicode lambda
Lassi Kortela
(20 May 2019 09:21 UTC)
|
Re: Unicode lambda
Marc Nieper-Wißkirchen
(21 Oct 2019 14:20 UTC)
|
Re: Unicode lambda
Shiro Kawai
(21 Oct 2019 17:19 UTC)
|
Re: Unicode lambda
John Cowan
(21 Oct 2019 17:39 UTC)
|
Re: Unicode lambda
Marc Nieper-Wißkirchen
(21 Oct 2019 18:43 UTC)
|
Re: Unicode lambda
John Cowan
(21 Oct 2019 23:27 UTC)
|
Encoding declarations
Lassi Kortela
(22 Oct 2019 08:39 UTC)
|
Re: Encoding declarations
John Cowan
(22 Oct 2019 20:52 UTC)
|
#! directives, general and specific
Lassi Kortela
(22 Oct 2019 09:11 UTC)
|
Re: #! directives, general and specific
John Cowan
(22 Oct 2019 20:27 UTC)
|
Re: #! directives, general and specific
Lassi Kortela
(22 Oct 2019 20:43 UTC)
|
Re: Unicode lambda
Marc Nieper-Wißkirchen
(13 May 2019 08:50 UTC)
|
Re: Unicode lambda
Lassi Kortela
(13 May 2019 10:27 UTC)
|
Re: Unicode lambda
Per Bothner
(12 May 2019 14:17 UTC)
|
Re: Unicode lambda
Peter
(12 May 2019 15:06 UTC)
|
I like the idea of allowing arbitrary s-exps following "#!" (i.e. "#!(precision 25)" instead of "#!precision=25") so that "#!" can be parsed as datum comments "#;". Note that Scheme ports are stateful. Directives like "#!fold-case" change the port's state for subsequent read (and principally also for write) operations. If something like "#!(encoding ...)" is ever standardized it should change the state of the port as well and should affect not only "read" but also "read-char", "write-char", etc. Independently of the encoding question, we should have another SRFI that specifies how the Scheme reader handles "#!" directives. I am thinking of a parameter object bound to a procedure that is called with the port and the s-exp following the "#!" directive whenever the reader encounters a "#!" directive. The procedure should be able to tail-call a success or a failure procedure. On such a SRFI, we can portably build many reader extensions.This SRFI should also expose a way to inspect and modify the state of ports (in particular whether case-folding is enabled, but also being able to access column and row counters makes sense). -- Marc Am Di., 14. Mai 2019 um 10:25 Uhr schrieb Lassi Kortela <xxxxxx@lassi.io>: > > > I agree with the peril of feature bloat. > > On the other hand, the problem of using S-expression declare-file form > > is that it conflates meta-information into contents. Suppose you have a > > config file whose format is a sequence of arbitrary S-expressions. How do > > you attach encoding declaration to it? Note that the config directive > > may happen to > > begin with a symbol 'declare-file', in application-specific semantics. > > You are absolutely correct. Starting a file with a particular Lisp form > means that that form becomes part of the "schema" of any file format it > touches and the schema parser has to take it into account (even if only > to ignore it). (In Scheme/Lisp code, the "schema" would be extended by > defining a macro.) That's the main drawback of this approach, and that's > why it should be optional (with the default coding being UTF-8 or > whatever the implementation default is). > > > READ should read arbitrary (syntactically valid) S-expressions, without > > interpreting > > its semantics, because interpretation is up to the caller of READ. > > Isn't reading an entire source file different from reading one > S-expression though? When you read a source file you have to support > shebang lines (#!) and encoding declarations. When you call READ, you > can assume that the textual port already has the correct encoding and > you don't need to worry about a shebang line. > > So read a source file = parse shebang line and encoding from a binary > port + convert to textual port + call normal READ in a loop until EOF. > > > The encoding information, however, is tricker than > > #!case-fold or #,(construuctor arg ...), since you can't read it as a > > text until you know > > the encoding. So even we adopt one of the existing reader syntax, > > encoding recogniztion > > will likely to be implemented separately from the existing reader syntax > > handling mechanism. > > Exactly. It's a chicken-and-egg problem :) S-expressions look and feel > different from magic comments but in this sense they are not. We could > put an encoding tag in S-exprs, XML (as John had in that blog post - in > fact, XML already has a standard way to write an encoding attribute in > XML itself), JSON, CSV or anything else and we'd have this same basic > problem we have with parsing comments - no better, no worse. > > The point of putting it in S-exprs is that since we use them for code > anyway, we might as well extend what we already have. Similarly, if we > were already using JSON we could declare the encoding in the JSON object > (in fact we would have to, since JSON doesn't support comments). > > I think magic comments would be the better choice (for compatibility > reasons) if there was a push to standardize the format of all magic > comments across popular languages so they can be parsed robustly, but > alas there is not. With S-exprs we at least have something principled. > > > The #!-identifier, and srfi-10 #,ctor syntax, specifically exist to > > communicate with READ > > out-of-band from the S-expressions. If we want to piggy-back with > > existing mechanism, > > we can use either one of them. > > This is true. With Marc's comment of float precision elsewhere in the > thread, it might be good if #! was allowed to take a list instead of a > symbol. In that case we could have #!(encoding euc-jp). > > The main problem with #! is that it can occur anywhere in the file, so > if encoding comes from #! then it can change in the middle of the file. > (This probably doesn't make sense for any practical purpose, so the > reader could raise an error when it gets the second encoding tag in the > same file.) I would perhaps avoid putting the encoding (and other things > that are meant to affect an entire stream, never only one part of it) in > #! because it "gives the wrong signal" to users about what is possible > to do with it. > > With #!(encoding ...) one might also change the encoding in the middle > of a REPL session, but I don't know if that makes sense either. A > terminal is supposed to have the same encoding constantly. > > > Well, if there's not so much agreement on this, I don't see its worth to > > standardize; we'll probably be able to stick with utf-8. > > Yes. With the spread of UTF-8, this flexibility seems vaguely like we > would be creating more problems and unnecessary complexity for future > implementors :) Maybe it's best to keep doing what we are doing now, not > write any encoding SRFI, and wait a few years until UTF-8 has completely > taken over and R8RS can mandate UTF-8 source code. The practical > situation already is that most code and terminals are UTF-8 (or plain > ASCII). I'll drop my suggestion. > > > communicate with READ out-of-band from the S-expressions. > > Just to clarify this point - I had thought of the declare-file form > mainly for other purposes; the encoding is just one little thing that it > could have. I probably presented my thoughts in a confusing way because > the emphasis has been on encodings in this discussion. I would not > specify a declare-file form if the _only_ thing it did was to give the > encoding. Rather, it would be a mechanism for specifying many different > kinds of things about a file (many of which we cannot yet anticipate, > and that's the point - it would be useful to have a standard place where > to declare things on a file scope with room for arbitrary extensions, > many of which could be implementation-specific or even project-specific). > > Most such metadata is not really meant to be read out-of-band from the > normal reader (the encoding declaration would be the only such thing I > can think of). A declare-file form would be a valid S-expression in > Scheme's normal syntax so the reader could just read it as normal > (assuming the corresponding macro is defined). It #!(lists ...) are > allowed then perhaps there could be an alternate version #!(declare-file > ...) if a version that doesn't add a form to the READ results is wanted. > In fact, it may be a good idea. > > The more ideas we throw around about all this reader stuff, the more I > grow to like the proposed #!(list ...) read syntax :) We could specify > it so that #!foo is equivalent to #!(foo). I think we should permit an > arbitrary form inside it since it may be useful to have extensions and > we already have a full reader at our disposal. There are probably some > details that will cause problems if absolutely all Scheme syntax is > permitted inside it but we can map those out. -- Prof. Dr. Marc Nieper-Wißkirchen Universität Augsburg Institut für Mathematik Universitätsstraße 14 86159 Augsburg Tel: 0821/598-2146 Fax: 0821/598-2090 E-Mail: xxxxxx@math.uni-augsburg.de Web: www.math.uni-augsburg.de/alg/mitarbeiter/mnieper/