Re: Why are byte ports "ports" as such?

Show/hide message thread

Why are byte ports "ports" as such? Ben Goetter (13 Apr 2006 17:54 UTC)
Re: Why are byte ports "ports" as such? John Cowan (13 Apr 2006 18:04 UTC)
Re: Why are byte ports "ports" as such? Marc Feeley (13 Apr 2006 21:41 UTC)
Re: Why are byte ports "ports" as such? John Cowan (14 Apr 2006 12:49 UTC)
Re: Why are byte ports "ports" as such? Marc Feeley (14 Apr 2006 13:37 UTC)
Re: Why are byte ports "ports" as such? Marc Feeley (13 Apr 2006 22:03 UTC)
Re: Why are byte ports "ports" as such? Ben Goetter (14 Apr 2006 01:02 UTC)
Re: Why are byte ports "ports" as such? Marc Feeley (14 Apr 2006 01:52 UTC)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
(missing)
Re: Why are byte ports "ports" as such? Marcin 'Qrczak' Kowalczyk (24 May 2006 16:17 UTC)
(missing)
(missing)
Re: Why are byte ports "ports" as such? John Cowan (24 May 2006 16:06 UTC)
(missing)
(missing)
Re: Why are byte ports "ports" as such? Thomas Bushnell BSG (24 May 2006 16:26 UTC)
Re: Why are byte ports "ports" as such? John Cowan (24 May 2006 17:18 UTC)
Re: Why are byte ports "ports" as such? Marc Feeley (24 May 2006 18:11 UTC)
Re: Why are byte ports "ports" as such? John Cowan (14 Apr 2006 12:26 UTC)

Re: Why are byte ports "ports" as such? Marc Feeley 13 Apr 2006 21:41 UTC

On 13-Apr-06, at 1:54 PM, Ben Goetter wrote:

> If you separate byte ports from character ports, and separate input
> ports from output ports (at least at the API level), you get an
> easily type-checked interface.  e.g.
>
> open-input-file string [encoding keywords] -> input-character-port
> read-char input-char-port -> character
> open-input-file-raw string -> input-byte-port
> read-byte input-byte-port -> integer
>

Did you read this section of the SRFI?

Byte ports support character I/O operations because with each byte
port is attached a character encoding specifying how characters are
encoded with bytes. It is incorrect to believe however that all ports
are byte ports. For example the ``string ports'' of SRFI 6 (Basic
String Ports) have no reason to be aware of the character to byte
encoding because they only deal with sequences of characters. So they
need not be byte ports. For this reason this SRFI views byte ports as
a subtype of character ports. Character ports support character I/O
operations and byte ports support character I/O operations and byte I/
O operations. All I/O operations which are valid on a character port
are also valid on a byte port. [Although not specified in this SRFI a
further generalization is ``object ports'' which are ports whose
fundamental I/O unit is the Scheme object. Character ports are object
ports because there is a standard encoding of (most) Scheme objects
to characters.]

SRFI 91 allows character I/O and binary I/O on byte ports because
often files use a format which mixes text and byte encoded data.
Viewing byte ports as a subtype of character ports is consistent with
current practice (i.e. "text files" are just binary files which
encode the characters with a sequence of bytes that depend on the
character encoding).

> For your bidi ports, perhaps
>
> open-input-output-file string [encoding keywords] -> input-char-
> port output-char-port
>
> with the two ports sharing common buffer structure in the
> implementation.
>

It is a pain to carry those two ports around in the code when the
program needs to communicate bidirectionally with some other entity
(another process, a user at a terminal, a socket, etc).  Moreover the
separation of a conceptually bidirectional channel into distinct
ports (input and output) destroys the conceptual link that they
have.  This hinders program understanding.  For example, with
bidirectional ports (close-port port) will close both sides of the
bidirectional port (i.e. the link between the input and output port
is preserved).  With two unidirectional ports you have to duplicate
some operations (closing ports, changing port settings, ...).

> Often one needs to open a file or a structure initially as a byte
> port, then decode subsequent sections of the sequence as characters
> of a particular encoding.  For that, a procedure like
>
> cook-input-encoding integer input-byte-port [encoding keywords] ->
> input-char-port
>
> can return a port that promises to decode a certain number of
> octets from the backing byte port with your encoding.  It does't
> handle variable-length structures well, though.
>

This is possible with SRFI 91.  Just open the file (in buffered or
non-buffered mode) and read your bytes, then read your characters.
If you need to read the characters first, then the file needs to be
opened in non-buffered mode, read your characters, then read your
bytes (after switching back to buffered mode if you wish).

By the way I'm tempted to add string ports to this SRFI (compatible
with SRFI 6 of course), and the analog ports for u8vectors, i.e.
u8vector ports.  String ports are character ports (but not byte
ports) and u8vector ports are byte ports (and character ports).
Something along these lines:

(open-input-string string-or-settings)
(open-output-string [string-or-settings])
(open-string [string-or-settings])

and

(open-input-u8vector u8vector-or-settings)
(open-output-u8vector [u8vector-or-settings])
(open-u8vector [u8vector-or-settings])

These would allow a more complete set of procedures for encoding and
decoding strings into u8vectors.  For example:

 > (with-output-to-u8vector
     (list char-encoding: 'UTF-8)
     (lambda () (write-char (integer->char 1234))))
#u8(211 146)

I'm currently holding back to keep the SRFI lean, but I may change my
mind (or write a separate SRFI).

> I like your read-substring and write-substring.

Great.

Marc