Email list hosting service & mailing list manager

Unicode lambda Lassi Kortela (12 May 2019 10:19 UTC)
Re: Unicode lambda Shiro Kawai (12 May 2019 11:18 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 11:40 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 11:50 UTC)
Re: Unicode lambda Shiro Kawai (12 May 2019 12:06 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (12 May 2019 12:11 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 12:23 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 13:23 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 13:46 UTC)
Re: Unicode lambda John Cowan (12 May 2019 14:20 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 14:38 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 14:55 UTC)
Re: Unicode lambda John Cowan (12 May 2019 15:00 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 15:20 UTC)
Re: Unicode lambda Shiro Kawai (12 May 2019 18:42 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 19:43 UTC)
Re: Unicode lambda John Cowan (12 May 2019 22:29 UTC)
Re: Unicode lambda Shiro Kawai (13 May 2019 10:48 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 08:25 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (14 May 2019 08:50 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 10:10 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 10:59 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 12:35 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 13:09 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 14:04 UTC)
Re: Unicode lambda Shiro Kawai (14 May 2019 19:18 UTC)
Re: Unicode lambda Vincent Manis (14 May 2019 22:01 UTC)
Re: Unicode lambda Lassi Kortela (20 May 2019 09:21 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (21 Oct 2019 14:20 UTC)
Re: Unicode lambda Shiro Kawai (21 Oct 2019 17:19 UTC)
Re: Unicode lambda John Cowan (21 Oct 2019 17:39 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (21 Oct 2019 18:43 UTC)
Re: Unicode lambda John Cowan (21 Oct 2019 23:27 UTC)
Encoding declarations Lassi Kortela (22 Oct 2019 08:39 UTC)
Re: Encoding declarations John Cowan (22 Oct 2019 20:52 UTC)
#! directives, general and specific Lassi Kortela (22 Oct 2019 09:11 UTC)
Re: #! directives, general and specific John Cowan (22 Oct 2019 20:27 UTC)
Re: #! directives, general and specific Lassi Kortela (22 Oct 2019 20:43 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (13 May 2019 08:50 UTC)
Re: Unicode lambda Lassi Kortela (13 May 2019 10:27 UTC)
Re: Unicode lambda Per Bothner (12 May 2019 14:17 UTC)
Re: Unicode lambda Peter (12 May 2019 15:06 UTC)

Re: Unicode lambda Lassi Kortela 12 May 2019 12:23 UTC

> 'read' can occur strictly before interpreting any of S-expressions, and reading in incorrect encoding can
> cause an I/O error so you may not have a chance to interpret those forms.

> The source file encoding should be a property of the port (as is the
> case-folding property). It could be set with a "#!" directive (at the
> top of the file).

The 'read' procedure that looks for the encoding declaration should be a
special reader that's much simpler than the normal Scheme reader and
handles encoding errors gracefully (perhaps it should simply read raw
bytes from a binary port, treat 0..127 as ASCII characters, and ignore
all other characters or treat them as whitespace or symbol/string
constituent).

Skipping/parsing the Unix shebang line (#!) at the start of a script is
in many Schemes/Lisps a similar magic job that needs its own reader (or
a hack to the standard reader).

> BTW, the "magic encoding comment" is supported in a few languages:
>
> Python: https://www.python.org/dev/peps/pep-0263/
> Ruby: https://idiosyncratic-ruby.com/26-file-encoding-magic.html

The fact that it's widespread is nice, but the syntax is not really well
specified. Ruby uses this regexp to look for it:

ENCODING_SPEC_RE = %r"coding\s*[=:]\s*([[:alnum:]\-_]+)"

Python uses this regexp:

^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)

Since the Scheme/Lisp is traditionally to resist the temptation of quick
hacks and do things in a principled way, I would really like to avoid
this approach even though other languages and editors are doing it. If a
magic comment syntax is to way to go, I would at least like to have a
principled and well specified grammar that can also be used other kinds
of magic comments (I have been collecting samples from many languages in
the hopes of specifying such a grammar, but I don't know if anyone would
adopt it).

> Technically the encoding info should be a metadata of a file, not in the content of the file, so the "coding" comment
> is certainly a kluge.  What I thought is that it might be useful to codify the current practice.

Correct, but it's certainly good to have the info somewhere since the
whole world is not Unicode yet.