Re: New draft (#5) of SRFI 152: String Library (reduced) Arthur A. Gleckler 18 Jul 2017 00:34 UTC William D Clinger 28 Jul 2017 12:18 UTC

> Here are John's comments on the draft:
>
>   This draft incorporates all of Sudarshan S Chawathe's
>   corrections.
>
>   The rationale for omitting UTF-32 conversions is that it
>   is essentially an unused encoding.  UTF-16 is used for
>   Windows APIs and some files; UTF-8 is used essentially
>   everywhere else, including the Web (90% of web pages are
>   UTF-8).  The only advantage of UTF-32-encoded bytevectors
>   is that theyt are both fixed-width and mutable, but on
>   Scheme systems that intern characters (which is most of
>   them), vectors of characters serve the same purpose with
>   greater ease of use.  R7RS and this SRFI already provide
>   string->vector and vector->string conversions.

The UTF-32 encoding does have another advantage over vectors
of characters:  On 64-bit systems, R7RS vectors of characters
occupy twice as much space as bytevectors with UTF-32.

Granted, that's only a factor of two, but the main technical
advantage of UTF-16 over UTF-32 is a factor of two.  The
popularity of UTF-16 suggests some people care about a mere
factor of two.

This wouldn't matter very much if procedures resembling R6RS
bytevector-u32-native-ref and bytevector-u32-native-set! were
available in R7RS Red Edition, because UTF-32 bytevector
encodings are trivial to implement using those two procedures
plus integer->char and char->integer.  Those two procedures
can of course be defined in terms of bytevector-u8-ref and
bytevector-u8-set!, but are then likely to be about four times
as slow.

As this is more of an argument for bytevector-u32-native-ref
and bytevector-u32-native-set! than for conversions between
strings and UTF-32 bytevector encodings, I don't think SRFI
152 has to address it.

Will