UTF-16 conversion - Simplelists

UTF-16 conversion John Cowan 09 Jun 2016 13:30 UTC
I propose that the UTF-16 conversions be split into three procedures each.

utf16->text checks to see if the first code unit is a BOM or reversed
BOM, and uses that to dictate the interpretation of the rest; the
BOM is not included in the text.  If the first character is anything
else, an implementation-defined endianness is used.  (Unicode suggests
big-endian, but Windows, which is the dominant producer of UTF-16 these
days, invariably uses little-endian.)

utf16be->text and utf16le->text simply use that endianness and do not
treat a BOM specially in any way.

text->utf16 always generates a BOM and employs implementation-defined
endianness.

text->utf16be and text->utf16le do not generate a BOM and use the
specified endianness.

--
John Cowan          http://www.ccil.org/~cowan        xxxxxx@ccil.org
A: "Spiro conjectures Ex-Lax."
Q: "What does Pat Nixon frost her cakes with?"
  --"Jeopardy" for generative semanticists