Re: New draft (#5) of SRFI 152: String Library (reduced) Arthur A. Gleckler 18 Jul 2017 00:34 UTC
William D Clinger 28 Jul 2017 12:18 UTC
> Here are John's comments on the draft:
>
> This draft incorporates all of Sudarshan S Chawathe's
> corrections.
>
> The rationale for omitting UTF-32 conversions is that it
> is essentially an unused encoding. UTF-16 is used for
> Windows APIs and some files; UTF-8 is used essentially
> everywhere else, including the Web (90% of web pages are
> UTF-8). The only advantage of UTF-32-encoded bytevectors
> is that theyt are both fixed-width and mutable, but on
> Scheme systems that intern characters (which is most of
> them), vectors of characters serve the same purpose with
> greater ease of use. R7RS and this SRFI already provide
> string->vector and vector->string conversions.
The UTF-32 encoding does have another advantage over vectors
of characters: On 64-bit systems, R7RS vectors of characters
occupy twice as much space as bytevectors with UTF-32.
Granted, that's only a factor of two, but the main technical
advantage of UTF-16 over UTF-32 is a factor of two. The
popularity of UTF-16 suggests some people care about a mere
factor of two.
This wouldn't matter very much if procedures resembling R6RS
bytevector-u32-native-ref and bytevector-u32-native-set! were
available in R7RS Red Edition, because UTF-32 bytevector
encodings are trivial to implement using those two procedures
plus integer->char and char->integer. Those two procedures
can of course be defined in terms of bytevector-u8-ref and
bytevector-u8-set!, but are then likely to be about four times
as slow.
As this is more of an argument for bytevector-u32-native-ref
and bytevector-u32-native-set! than for conversions between
strings and UTF-32 bytevector encodings, I don't think SRFI
152 has to address it.
Will