Email list hosting service & mailing list manager

Unicode lambda Lassi Kortela (12 May 2019 10:19 UTC)
Re: Unicode lambda Shiro Kawai (12 May 2019 11:18 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 11:40 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 11:50 UTC)
Re: Unicode lambda Shiro Kawai (12 May 2019 12:06 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (12 May 2019 12:11 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 12:23 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 13:23 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 13:46 UTC)
Re: Unicode lambda John Cowan (12 May 2019 14:20 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 14:38 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 14:55 UTC)
Re: Unicode lambda John Cowan (12 May 2019 15:00 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 15:20 UTC)
Re: Unicode lambda Shiro Kawai (12 May 2019 18:42 UTC)
Re: Unicode lambda Lassi Kortela (12 May 2019 19:43 UTC)
Re: Unicode lambda John Cowan (12 May 2019 22:29 UTC)
Re: Unicode lambda Shiro Kawai (13 May 2019 10:48 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 08:25 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (14 May 2019 08:50 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 10:10 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 10:59 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 12:35 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 13:09 UTC)
Re: Unicode lambda Lassi Kortela (14 May 2019 14:04 UTC)
Re: Unicode lambda Shiro Kawai (14 May 2019 19:18 UTC)
Re: Unicode lambda Vincent Manis (14 May 2019 22:01 UTC)
Re: Unicode lambda Lassi Kortela (20 May 2019 09:21 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (21 Oct 2019 14:20 UTC)
Re: Unicode lambda Shiro Kawai (21 Oct 2019 17:19 UTC)
Re: Unicode lambda John Cowan (21 Oct 2019 17:39 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (21 Oct 2019 18:43 UTC)
Re: Unicode lambda John Cowan (21 Oct 2019 23:27 UTC)
Encoding declarations Lassi Kortela (22 Oct 2019 08:39 UTC)
Re: Encoding declarations John Cowan (22 Oct 2019 20:52 UTC)
#! directives, general and specific Lassi Kortela (22 Oct 2019 09:11 UTC)
Re: #! directives, general and specific John Cowan (22 Oct 2019 20:27 UTC)
Re: #! directives, general and specific Lassi Kortela (22 Oct 2019 20:43 UTC)
Re: Unicode lambda Marc Nieper-Wißkirchen (13 May 2019 08:50 UTC)
Re: Unicode lambda Lassi Kortela (13 May 2019 10:27 UTC)
Re: Unicode lambda Per Bothner (12 May 2019 14:17 UTC)
Re: Unicode lambda Peter (12 May 2019 15:06 UTC)

Re: Unicode lambda Lassi Kortela 12 May 2019 14:38 UTC

> I did something similar (but in pseudo-code, not code) in my old blog
> post "Hello!  I am an XML encoding sniffer" at
> <http://recycledknowledge.blogspot.com/2005/07/hello-i-am-xml-encoding-sniffer.html>.

That's great :) I just changed it so it skips all bytes except 1..126.
So now it can read UTF-16, both big- and little-endian, with or without
byte order mark.

To be sure, if there's more data in it than just the encoding
declaration, that data should be able to use bytes above 126. But for
parsing only the encoding declaration this skipping is appropriate.

> As long as you are using a custom reader, however, you might as well
> make the whole declare-file form a comment so that the regular reader
> can just start over.

It's nice to be able to do that, but maybe it shouldn't be a comment.
After all, it could contain metadata that is useful to read from Lisp
and much of the point of S-expression syntax is that it's extensible
like this. If we make it a comment, it's just a glorified version of the
original magic comments hack :-/

For implementations that don't support it, skipping the thing is a
one-line macro:

;; Scheme:
(define-syntax declare-file (syntax-rules () ((_ rest ...) #f)))

;; Common Lisp:
(defmacro declare-file (&body body) (declare (ignore body)) nil)

;; Emacs Lisp:
(defmacro declare-file (&rest _rest) nil)

;; Clojure:
(defmacro declare-file [& body] nil)