Re: Unicode lambda | Simplelists

Show/hide message thread

Unicode lambda Lassi Kortela (12 May 2019 10:19 UTC)

Re: Unicode lambda Shiro Kawai (12 May 2019 11:18 UTC)

Re: Unicode lambda Lassi Kortela (12 May 2019 11:40 UTC)

Re: Unicode lambda Lassi Kortela (12 May 2019 11:50 UTC)

Re: Unicode lambda Shiro Kawai (12 May 2019 12:06 UTC)

Re: Unicode lambda Marc Nieper-Wißkirchen (12 May 2019 12:11 UTC)

Re: Unicode lambda Lassi Kortela (12 May 2019 12:23 UTC)

Re: Unicode lambda Lassi Kortela (12 May 2019 13:23 UTC)

Re: Unicode lambda Lassi Kortela (12 May 2019 13:46 UTC)

Re: Unicode lambda John Cowan (12 May 2019 14:20 UTC)

Re: Unicode lambda Lassi Kortela (12 May 2019 14:38 UTC)

Re: Unicode lambda Lassi Kortela (12 May 2019 14:55 UTC)

Re: Unicode lambda John Cowan (12 May 2019 15:00 UTC)

Re: Unicode lambda Lassi Kortela (12 May 2019 15:20 UTC)

Re: Unicode lambda Shiro Kawai (12 May 2019 18:42 UTC)

Re: Unicode lambda Lassi Kortela (12 May 2019 19:43 UTC)

Re: Unicode lambda John Cowan (12 May 2019 22:29 UTC)

Re: Unicode lambda Shiro Kawai (13 May 2019 10:48 UTC)

Re: Unicode lambda Lassi Kortela (14 May 2019 08:25 UTC)

Re: Unicode lambda Marc Nieper-Wißkirchen (14 May 2019 08:50 UTC)

Re: Unicode lambda Lassi Kortela (14 May 2019 10:10 UTC)

Re: Unicode lambda Lassi Kortela (14 May 2019 10:59 UTC)

Re: Unicode lambda Lassi Kortela (14 May 2019 12:35 UTC)

Re: Unicode lambda Lassi Kortela (14 May 2019 13:09 UTC)

Re: Unicode lambda Lassi Kortela (14 May 2019 14:04 UTC)

Re: Unicode lambda Shiro Kawai (14 May 2019 19:18 UTC)

Re: Unicode lambda Vincent Manis (14 May 2019 22:01 UTC)

Re: Unicode lambda Lassi Kortela (20 May 2019 09:21 UTC)

Re: Unicode lambda Marc Nieper-Wißkirchen (21 Oct 2019 14:20 UTC)

Re: Unicode lambda Shiro Kawai (21 Oct 2019 17:19 UTC)

Re: Unicode lambda John Cowan (21 Oct 2019 17:39 UTC)

Re: Unicode lambda Marc Nieper-Wißkirchen (21 Oct 2019 18:43 UTC)

Re: Unicode lambda John Cowan (21 Oct 2019 23:27 UTC)

Encoding declarations Lassi Kortela (22 Oct 2019 08:39 UTC)

Re: Encoding declarations John Cowan (22 Oct 2019 20:52 UTC)

#! directives, general and specific Lassi Kortela (22 Oct 2019 09:11 UTC)

Re: #! directives, general and specific John Cowan (22 Oct 2019 20:27 UTC)

Re: #! directives, general and specific Lassi Kortela (22 Oct 2019 20:43 UTC)

Re: Unicode lambda Marc Nieper-Wißkirchen (13 May 2019 08:50 UTC)

Re: Unicode lambda Lassi Kortela (13 May 2019 10:27 UTC)

Re: Unicode lambda Per Bothner (12 May 2019 14:17 UTC)

Re: Unicode lambda Peter (12 May 2019 15:06 UTC)

Re: Unicode lambda Marc Nieper-Wißkirchen 14 May 2019 08:50 UTC

I like the idea of allowing arbitrary s-exps following "#!" (i.e.
"#!(precision 25)" instead of "#!precision=25") so that "#!" can be
parsed as datum comments "#;".

Note that Scheme ports are stateful. Directives like "#!fold-case"
change the port's state for subsequent read (and principally also for
write) operations. If something like "#!(encoding ...)" is ever
standardized it should change the state of the port as well and should
affect not only "read" but also "read-char", "write-char", etc.

Independently of the encoding question, we should have another SRFI
that specifies how the Scheme reader handles "#!" directives. I am
thinking of a parameter object bound to a procedure that is called
with the port and the s-exp following the "#!" directive whenever the
reader encounters a "#!" directive. The procedure should be able to
tail-call a success or a failure procedure. On such a SRFI, we can
portably build many reader extensions.This SRFI should also expose a
way to inspect and modify the state of ports (in particular whether
case-folding is enabled, but also being able to access column and row
counters makes sense).

-- Marc

Am Di., 14. Mai 2019 um 10:25 Uhr schrieb Lassi Kortela <xxxxxx@lassi.io>:
>
> > I agree with the peril of feature bloat.
> > On the other hand, the problem of using S-expression declare-file form
> > is that it conflates meta-information into contents.  Suppose you have a
> > config file whose format is a sequence of arbitrary S-expressions.  How do
> > you attach encoding declaration to it?  Note that the config directive
> > may happen to
> > begin with a symbol 'declare-file', in application-specific semantics.
>
> You are absolutely correct. Starting a file with a particular Lisp form
> means that that form becomes part of the "schema" of any file format it
> touches and the schema parser has to take it into account (even if only
> to ignore it). (In Scheme/Lisp code, the "schema" would be extended by
> defining a macro.) That's the main drawback of this approach, and that's
> why it should be optional (with the default coding being UTF-8 or
> whatever the implementation default is).
>
> > READ should read arbitrary (syntactically valid) S-expressions, without
> > interpreting
> > its semantics, because interpretation is up to the caller of READ.
>
> Isn't reading an entire source file different from reading one
> S-expression though? When you read a source file you have to support
> shebang lines (#!) and encoding declarations. When you call READ, you
> can assume that the textual port already has the correct encoding and
> you don't need to worry about a shebang line.
>
> So read a source file = parse shebang line and encoding from a binary
> port + convert to textual port + call normal READ in a loop until EOF.
>
> > The encoding information, however, is tricker than
> > #!case-fold or #,(construuctor arg ...), since you can't read it as a
> > text until you know
> > the encoding.  So even we adopt one of the existing reader syntax,
> > encoding recogniztion
> > will likely to be implemented separately from the existing reader syntax
> > handling mechanism.
>
> Exactly. It's a chicken-and-egg problem :) S-expressions look and feel
> different from magic comments but in this sense they are not. We could
> put an encoding tag in S-exprs, XML (as John had in that blog post - in
> fact, XML already has a standard way to write an encoding attribute in
> XML itself), JSON, CSV or anything else and we'd have this same basic
> problem we have with parsing comments - no better, no worse.
>
> The point of putting it in S-exprs is that since we use them for code
> anyway, we might as well extend what we already have. Similarly, if we
> were already using JSON we could declare the encoding in the JSON object
> (in fact we would have to, since JSON doesn't support comments).
>
> I think magic comments would be the better choice (for compatibility
> reasons) if there was a push to standardize the format of all magic
> comments across popular languages so they can be parsed robustly, but
> alas there is not. With S-exprs we at least have something principled.
>
> > The #!-identifier, and srfi-10 #,ctor syntax, specifically exist to
> > communicate with READ
> > out-of-band from the S-expressions.  If we want to piggy-back with
> > existing mechanism,
> > we can use either one of them.
>
> This is true. With Marc's comment of float precision elsewhere in the
> thread, it might be good if #! was allowed to take a list instead of a
> symbol. In that case we could have #!(encoding euc-jp).
>
> The main problem with #! is that it can occur anywhere in the file, so
> if encoding comes from #! then it can change in the middle of the file.
> (This probably doesn't make sense for any practical purpose, so the
> reader could raise an error when it gets the second encoding tag in the
> same file.) I would perhaps avoid putting the encoding (and other things
> that are meant to affect an entire stream, never only one part of it) in
> #! because it "gives the wrong signal" to users about what is possible
> to do with it.
>
> With #!(encoding ...) one might also change the encoding in the middle
> of a REPL session, but I don't know if that makes sense either. A
> terminal is supposed to have the same encoding constantly.
>
> > Well, if there's not so much agreement on this, I don't see its worth to
> > standardize; we'll probably be able to stick with utf-8.
>
> Yes. With the spread of UTF-8, this flexibility seems vaguely like we
> would be creating more problems and unnecessary complexity for future
> implementors :) Maybe it's best to keep doing what we are doing now, not
> write any encoding SRFI, and wait a few years until UTF-8 has completely
> taken over and R8RS can mandate UTF-8 source code. The practical
> situation already is that most code and terminals are UTF-8 (or plain
> ASCII). I'll drop my suggestion.
>
> > communicate with READ out-of-band from the S-expressions.
>
> Just to clarify this point - I had thought of the declare-file form
> mainly for other purposes; the encoding is just one little thing that it
> could have. I probably presented my thoughts in a confusing way because
> the emphasis has been on encodings in this discussion. I would not
> specify a declare-file form if the _only_ thing it did was to give the
> encoding. Rather, it would be a mechanism for specifying many different
> kinds of things about a file (many of which we cannot yet anticipate,
> and that's the point - it would be useful to have a standard place where
> to declare things on a file scope with room for arbitrary extensions,
> many of which could be implementation-specific or even project-specific).
>
> Most such metadata is not really meant to be read out-of-band from the
> normal reader (the encoding declaration would be the only such thing I
> can think of). A declare-file form would be a valid S-expression in
> Scheme's normal syntax so the reader could just read it as normal
> (assuming the corresponding macro is defined). It #!(lists ...) are
> allowed then perhaps there could be an alternate version #!(declare-file
> ...) if a version that doesn't add a form to the READ results is wanted.
> In fact, it may be a good idea.
>
> The more ideas we throw around about all this reader stuff, the more I
> grow to like the proposed #!(list ...) read syntax :) We could specify
> it so that #!foo is equivalent to #!(foo). I think we should permit an
> arbitrary form inside it since it may be useful to have extensions and
> we already have a full reader at our disposal. There are probably some
> details that will cause problems if absolutely all Scheme syntax is
> permitted inside it but we can map those out.

--
Prof. Dr. Marc Nieper-Wißkirchen

Universität Augsburg
Institut für Mathematik
Universitätsstraße 14
86159 Augsburg

Tel: 0821/598-2146
Fax: 0821/598-2090

E-Mail: xxxxxx@math.uni-augsburg.de
Web: www.math.uni-augsburg.de/alg/mitarbeiter/mnieper/