Re: Case-mapping, Unicode & internationalisation
Sergei Egorov 24 Jan 2000 17:10 UTC
I believe that UPCASE-STRING, DOWNCASE-STRING, and
TITLECASE-STRING belong to a separate domain of 'text processes'
that should be addressed in separate SRFIs. I think that the best approach
in Unicode context is to treat Scheme strings as just arrays of characters
('code points') with no special well-formedness constaints; for example,
it should be legal to have a string consisting of combining characters
with no preceding base character, or a string with low-half surrogate
character not followed by high-half surrogate character.
A "string" library can contain relatively simple procedures that are
useful in traditional applications; it can also serve as a basis for
building
'text processes' described in the Unicode standard.
A "char" library can contain procedures to access character properties
described in the Unicode database.
A "text" library can include the 'text' data type representing well-formed
character sequences and allowing effective implementation of text processes
plus all the necessary primitives to work with this data type.
A "basic text processes" library can contain specification/implementation
of canonical and compatibility decomposition based on text primitives.
Other libraries can implement other text processes, including case
mapping, locating text element boundaries, and collation for different
languages.
-- Sergei