Re: Specification vs. Implementation Alex Shinn 25 Aug 2005 01:33 UTC

On 8/25/05, Michael Sperber <xxxxxx@informatik.uni-tuebingen.de> wrote:
>
> I don't understand why it would be a lot of work---that's what the
> reference implementation is for.  Just drop it in.  You mainly have to
> adapt the primitive I/O layer, which is usually pretty simple.  (You
> could even build it easily on a pre-existing ports layer.)

You also have to rewrite your existing ports layer to distinguish
between the original native ports and the new stream-based ports.

There's also still a lot missing from SRFI-68 since it doesn't provide
more than some trivial transcoders, and in cases like C's wchar
implementation there is no direct way to transcode wstrings - you
have to use an external library like iconv.

> The bottom two layers are there because they're useful.  If all layers
> are required, portable code could use whatever layer it wants to.  I
> don't see why the lower layers (especially the primitive layer) would
> necessarily be extremely slow---maybe you can elaborate why you think
> this is the case?

If you aren't using a systems-level language (or don't want to go through
the effort), then you have to base the lower-levels on an existing upper
level, which implies all text gets buffered twice.

I also have concerns about the stream-layer in general.  I really like
the idea of being able to treat a port like a list (although given the
choice I'd rather work with car and cdr than car+cdr).  However, it
requires either large amounts of buffering or constant position checks
followed by optional seeks.  The latter again is extra overhead (and
seeks are very expensive and should be avoided whenever possible),
so the reference implementation uses the former.  But consider the
following code which reads an index from the start of a stream
in order to determine the offset of a record, seeks to that record
and then extracts it (using SRFI-71 let syntax):

(define (read-nth-record file n)
  (let* ((stream1 (open-file-input-stream file))
         (index stream2 (read-index stream1))
         (offset (nth-record-offset index n))
         (stream3 (input-stream-at-position stream2 offset))
         (result stream4 (read-record stream3)))
    (close-input-stream stream4)
    result))

If the record is the last element of a 10GB file we would need
10GB of memory to do this.  A smart compiler and gc could
determine the older streams aren't re-used and reclaim them
in this case, but what if they were re-used?  Or what if they
aren't but the compiler can't prove it?

Note I had to assume a new procedure

  input-stream-at-position

to seek within the file, which isn't provided by the current SRFI.
This is a crucial procedure, but can't efficiently be implemented
without closing the current stream and creating a new one from
the underlying reader.  Likewise we need SET-PORT-POSITION!.

Another issue is that buffering doesn't account for mutations
performed between reads, so libraries like C provide explicit
buffering controls, including non-buffering.  The current SRFI
effectively prohibits ports from having this level of control.  They
would in fact be better built on the readers and writers than on
streams.

--
Alex