Re: Why are byte ports "ports" as such?
Per Bothner 24 May 2006 05:07 UTC
Thomas Bushnell BSG wrote:
> I can't tell what you're arguing for.
>
> We *do* have something we can call characters: characters. You might
> find them useless, but their semantics are quite clear.
Maybe in your universe.
> Are you arguing which of the following:
>
> 1) We should have neither code points nor characters;
> 2) We should have code points and not characters, and call code points
> something like "code-points";
> 3) We should have code points and not characters, and call code points
> something like "characters";
> 4) We should have both code points and characters, call code points
> something like "characters" and call characters something else.
>
> If you are arguing (1), then fine, let's drop both. If you are
> arguing (3) and (4), there is no defense for your position.
That's very arrogant. I'm arguing for (3). Most other programming
languages have chosen this solution, because it works. I don't know
of any that have implemented "character" (in your sense) as a primitive
data type, so it is up to you to explain how to do it.
>> What does char->integer return? How does char<? work? What is your
>> proposed implementation for a "character" in the Unicode world, given
>> that it is not a code-point? How would you store characters in a
>> string?
>
> Storage is irrelevant. An implementation would be free to store
> characters however it wished. char->integer and char<? can return
> whatever the implementation pleases. I would rather drop them, since
> they have nothing really to do with characters. They are functions on
> *code points*, which are there because the R5RS authors did not bother
> to distinguish code points from characters.
I'm asking how *you* would implement a "character" data type.
Assume you have 32-bit "scheme values". Would you make characters
immediate/unboxed values? In that case, assume you have 28 bits.
Or are characters pointers to objects in memory? If so, how are
they managed? Are equal characters eq? Suppose I have a UTF-8
input file. What does read-char do? What is a string - an array
of 32-bit Scheme values or could it be more compact?
--
--Per Bothner
xxxxxx@bothner.com http://per.bothner.com/