I propose that the UTF-16 conversions be split into three procedures each.
utf16->text checks to see if the first code unit is a BOM or reversed
BOM, and uses that to dictate the interpretation of the rest; the
BOM is not included in the text. If the first character is anything
else, an implementation-defined endianness is used. (Unicode suggests
big-endian, but Windows, which is the dominant producer of UTF-16 these
days, invariably uses little-endian.)
utf16be->text and utf16le->text simply use that endianness and do not
treat a BOM specially in any way.
text->utf16 always generates a BOM and employs implementation-defined
endianness.
text->utf16be and text->utf16le do not generate a BOM and use the
specified endianness.
--
John Cowan http://www.ccil.org/~cowan xxxxxx@ccil.org
A: "Spiro conjectures Ex-Lax."
Q: "What does Pat Nixon frost her cakes with?"
--"Jeopardy" for generative semanticists