Map & fold & unfold granularity John Cowan (09 Jun 2016 13:24 UTC)
Re: Map & fold & unfold granularity Shiro Kawai (09 Jun 2016 14:36 UTC)
Re: Map & fold & unfold granularity John Cowan (09 Jun 2016 15:07 UTC)

Map & fold & unfold granularity John Cowan 09 Jun 2016 13:24 UTC

I put this in briefer form into the bottom of another email, so I'm
repeating it here under its own subject line so it won't be lost sight of.

In a Unicode world, codepoint granularity for mapping functions will
often be too fine, but we cannot say for all processing tasks what the
correct granularity is.  Sometimes it is code points, sometimes it is
grapheme clusters (legacy or extended), sometimes it is whole words or
larger textual units.  See UAX #29 at <http://unicode.org/reports/tr29/>
for discussions of these terms, and note that this is an official part
of the Unicode Standard.

To generalize over all of these, I propose replacing the procedures passed
to textual-map, textual-fold, etc. to accept a text as their argument,
namely what has yet to be processed, and return two texts: the rest
of the text as yet unprocessed and the processed result of this call.
Thus a procedure that wants to process codepoint-by-codepoint uses
(text-ref t 0) to examine the first codepoint of its argument t, and
(subtext t 1) to get the first value to return.

In the case of textual-for-each, the second value could be required but
ignored, or could just not be returned; I'm uncertain which is better.
Because of the awkwardness of handling optional multiple values in Scheme,
one or the other should be chosen.

For the unfolds, the mapper argument should return a text rather than
a character.

--
John Cowan          http://www.ccil.org/~cowan        xxxxxx@ccil.org
May the hair on your toes never fall out!  --Thorin Oakenshield (to Bilbo)