Per Bothner <xxxxxx@bothner.com> writes:
> On 1/15/24 18:28, John Cowan wrote:
>> On Sun, Jan 14, 2024 at 12:59 PM Per Bothner <xxxxxx@bothner.com
>> <mailto:xxxxxx@bothner.com>> wrote:
>> If you want a predicate to detect invalid code points (why? - what is the
>> use case?)
>> Because it's not a character. (Are you sure you don't mean an unassigned code
>> point? That should not be an error.)
>
> <snip>
>
> Of course this does not preclude signalling an error if a "character constructor" (such
> as the integer->char procedure) is passed invalid arguments. However, sting-ref should
> never fail if the index is in range.
Perhaps another SRFI is in order to better specify Unicode behavior?
I think for the purposes of this SRFI, changing the sample
implementation to do this:
(define max-char (cond-expand (full-unicode #x10FFFF) (else 127)))
and then filtering out the invalid Unicode range #xD800-#xDFFF should
handle this issue. The full-unicode feature identifier is specified in
R7RS-small as:
"All Unicode characters present in Unicode version 6.0 are supported as
Scheme characters."
So that would mean the entire valid Unicode range, including codepoints
that are not assigned or are for private use.
For implementations that support non-Unicode characters (char values
greater than #x10FFFF), they can easily update the values in the sample
implementation.