Re: Why are byte ports "ports" as such?

Re: Why are byte ports "ports" as such? Per Bothner 24 May 2006 05:07 UTC
Thomas Bushnell BSG wrote:
  > I can't tell what you're arguing for.
>
> We *do* have something we can call characters: characters.  You might
> find them useless, but their semantics are quite clear.

Maybe in your universe.

> Are you arguing which of the following:
>
> 1) We should have neither code points nor characters;
> 2) We should have code points and not characters, and call code points
>    something like "code-points";
> 3) We should have code points and not characters, and call code points
>    something like "characters";
> 4) We should have both code points and characters, call code points
>    something like "characters" and call characters something else.
>
> If you are arguing (1), then fine, let's drop both.  If you are
> arguing (3) and (4), there is no defense for your position.

That's very arrogant.  I'm arguing for (3).  Most other programming
languages have chosen this solution, because it works.  I don't know
of any that have implemented "character" (in your sense) as a primitive
data type, so it is up to you to explain how to do it.

>> What does char->integer return?  How does char<? work?  What is your
>> proposed implementation for a "character" in the Unicode world, given
>> that it is not a code-point?  How would you store characters in a
>> string?
>
> Storage is irrelevant.  An implementation would be free to store
> characters however it wished.  char->integer and char<? can return
> whatever the implementation pleases.  I would rather drop them, since
> they have nothing really to do with characters.  They are functions on
> *code points*, which are there because the R5RS authors did not bother
> to distinguish code points from characters.

I'm asking how *you* would implement a "character" data type.
Assume you have 32-bit "scheme values".  Would you make characters
immediate/unboxed values?  In that case, assume you have 28 bits.
Or are characters pointers to objects in memory?  If so, how are
they managed?  Are equal characters eq?  Suppose I have a UTF-8
input file.  What does read-char do?  What is a string - an array
of 32-bit Scheme values or could it be more compact?
--
	--Per Bothner
xxxxxx@bothner.com   http://per.bothner.com/