Michael Sperber wrote: > Per Bothner <xxxxxx@bothner.com> writes: > > >>I don't buy this at all. Why can't I replace: >> (make-simple-reader id descriptor etc ...) >>by >> (make-simple-input-port id descriptor etc ...) >>I.e. why can't I merge the functionality of readers into >>input-ports? > > > You could do that, but you'd have eliminated only one procedure call. > Moreover, it's unclear to me what you've gained. You may be missing my point. The "make-simple-input-port" function wouldn't call make-simple-reader because there would be no simple reader type anymore. The input-port would be an object with the necessary "methods". What we're gaining is removing an extra level of indirection and layering for *all* operations. > It still wouldn't make the primitive layer obsolete. I think it would. The point is there is no need for a separate primitive layer. At least I haven't seen such a need, and there are good reasons for avoiding it, primarily simplicity, both in API and implementation. The latter can allow better performance. > You've stated that you think you can get their performance, and lack > of buffering from the ports layer, but you haven't demonstrated how to > do it---it's certainly not straightforward, given the buffering > inherent in things like PEEK-CHAR. Agreed. If character operations are performed, then we require at least some buffering. And we can't just use the system block-boundary buffers, since characters can straddle block boundaries. > It'll take a concrete proposal to convince me to change the layout. Here are some preliminary thoughts. In pre-JDK-1.4 Java (i.e. without direct access to the translation API) you'd still need multiple layers (i.e. a separate Reader for chars and an InputStream for bytes) but the layers would be driven by implementation considerations, and not constrained by the Scheme API. I.e. an input-port might be an object that has both an InputStream and a Reader. I think we should specifrequire only these modes: * A program can arbitrarily switch back and forth between bytes amd UTF-8 or Latin-1 chars. In that case the Scheme "ports" can do the conversion directly without depending on external conversion APIs. Such a port would basically be a binary port with a byte buffer (a blob). (An unbuffered port still has a one-byte buffer.) The current position is a byte position, indicated by an offset into the blob. Reading bytes is obvious; reading also more-or-less so. Handling peek-char is trickier if the blob only contains a partial character. We can implement this using an extra short buffer, which can be represented by a fixnum field. We can also use negative blob indexes for when the current position is in the short buffer. Suppose the current position is before the last byte in the buffer, and that byte is the first byte of a multi-byte character. peek-char gets the byte, fills the buffer, and looks at enough bytes in the new "block" to determine the character. It saves the byte from the previous block in the short buffer, and notes that the offset in the current block is -1 - i.e. one byte *before* the start of the current block. A subsequent read or peek operation notes this negative offset, and gets the data from the short buffer. Java input streams have a read-ahead mehanism, where you can mark a position, read ahead, and then reset back to the mark. This is a useful feature for lexers/parsers. It makes sense to combine support this read-ahead support with that needed for peek-char. I.e. instead of a "short buffer" you have an extra buffer "save buffer". So I'd recommend using two buffers: the "system buffer" is block-sized, propertly aligned, etc, for actual I/O. The system buffer is missing (i.e. zero bytes long) for unbuffered files. In addition, there is a "save buffer" which is normally just a few bytes, but can grow arbitrarily large, if we allow arbitrary look-ahead. Conceptually, we have a single buffer, consisting of the concatenation of the save buffer and the system buffer. (An implementation can use a single buffer, but that will normally be less efficient.) The current position is an offset, where non-negative values point into the system buffer, and negative values point into the save buffer. The position can always be reset to any point between the start of the save buffer and the current position. A peek is then a "mark current position as a save-point", read-ahead, and then revert back to the saved position. Relatively simple, efficient, and general. Output ports don't have the same complications. (They may have other complications, such as pretty-printing and handling cycles, but mixing binary and text in different encodings is at least conceptually straight-forward.) * A program can switch from reading/writing bytes to reading/writing chars in an abitrary support encoding, but cannot necessarily switch back, or switch to a different encoding, unless the encoding is UTF-8 or Latin-1. In that case the implementation can layer a implementation character stream on top of an implementation byte stream. This is fairly straightforward in Java, but some implementations have have difficulty if there have been any byte operations before the first char operations. So it should be permissible for an implementation to not support *any* byte/char mixing (except for UTF-8 and Latin-1). -- --Per Bothner xxxxxx@bothner.com http://per.bothner.com/