I'm assuming these lengths are purely local in scope, there's nothing that says "the complete thing I'm writing is X long, and here's part 1, part 2, etc."?

Because I've for example done #2 when using a [format (insert right name since I haven't read the naming postings yet)] intended for interchange between computer systems with the source having all the data before starting.  Instead of that, I was generating raw data from a *very* fast 600 lb/275 kg monster Kodak scanner that I couldn't buffer except by writing to hard disk.

Note also #2 won't work unless your Scheme has lseek or its Windows equivalent (the above was on Windows 3.1) integrated into its I/O system, which is not yet standardized.  And while this is probably not going to be an issue, it also doesn't work for modern, or at least LTO style tapes, which use partly overlapping writes at the magnetic head level, and cannot like old fashioned tapes seek back and rewrite a tape sector.  Although that also applies to modern cloud "object" storage like Amazon's AWS S3.

I suppose we're in no danger any time soon for someone to start using Scheme to e.g. design and verify big integrated circuits (Lisp Machine Lisp was used to design LMI's Lambda CPU), but I just want to make sure we aren't limiting ourselves too much for the future.

- Harold

----- Original message -----
From: John Cowan <xxxxxx@ccil.org>
Date: Friday, September 27, 2019 9:22 AM

I think the first method is clearly superior.  Of the usual data structures, only lists normally take O(n) time to get the length.  In any case, R7RS `write` already has to do this in order to know where to insert datum labels.

On Fri, Sep 27, 2019 at 9:58 AM Lassi Kortela <xxxxxx@lassi.io> wrote:
> Is the length always the number of bytes? I was under the impression
>> it's the number of elements in case the value is a list/vector.
>
> No, it's always bytes.  See <https://en.wikipedia.org/wiki/X.690>, which is
> a handy summary, though hardly suitable as an explanation from scratch.

That troubles me a bit. It means that when writing data, you need to
know in advance how much you are writing (even for complex nested
structures of heterogeneous data, not just for strings). Effectively you
have to either:

1. Two passes over the data (in the first pass, count the total length
without writing anything; in the second pass, actually write).

2. Write a dummy value for the length, then seek back the stream and
patch it once you know the true length.

3. Write objects recursively in a stack of temp buffers, each layer of
buffers being concatenated one layer up.

None of these are ideal. It allows the reader to skip past uninteresting
objects very quickly, but that's not much of an issue nowadays.