Case-mapping, Unicode & internationalisation
shivers@xxxxxx
(24 Jan 2000 13:37 UTC)
|
Re: Case-mapping, Unicode & internationalisation Sergei Egorov (24 Jan 2000 17:09 UTC)
|
text processes vs. string procedures
shivers@xxxxxx
(24 Jan 2000 21:52 UTC)
|
Re: text processes vs. string procedures
Sergei Egorov
(24 Jan 2000 22:39 UTC)
|
Re: text processes vs. string procedures
shivers@xxxxxx
(25 Jan 2000 01:19 UTC)
|
Re: Case-mapping, Unicode & internationalisation Sergei Egorov 24 Jan 2000 17:10 UTC
I believe that UPCASE-STRING, DOWNCASE-STRING, and TITLECASE-STRING belong to a separate domain of 'text processes' that should be addressed in separate SRFIs. I think that the best approach in Unicode context is to treat Scheme strings as just arrays of characters ('code points') with no special well-formedness constaints; for example, it should be legal to have a string consisting of combining characters with no preceding base character, or a string with low-half surrogate character not followed by high-half surrogate character. A "string" library can contain relatively simple procedures that are useful in traditional applications; it can also serve as a basis for building 'text processes' described in the Unicode standard. A "char" library can contain procedures to access character properties described in the Unicode database. A "text" library can include the 'text' data type representing well-formed character sequences and allowing effective implementation of text processes plus all the necessary primitives to work with this data type. A "basic text processes" library can contain specification/implementation of canonical and compatibility decomposition based on text primitives. Other libraries can implement other text processes, including case mapping, locating text element boundaries, and collation for different languages. -- Sergei