Archive file SRFIs Lassi Kortela (01 Apr 2019 08:49 UTC)
Re: Archive file SRFIs Arthur A. Gleckler (01 Apr 2019 15:05 UTC)
Re: Archive file SRFIs Lassi Kortela (01 Apr 2019 16:29 UTC)
Re: Archive file SRFIs John Cowan (01 Apr 2019 16:47 UTC)
Re: Archive file SRFIs Lassi Kortela (01 Apr 2019 17:38 UTC)
Re: Archive file SRFIs Göran Weinholt (01 Apr 2019 20:14 UTC)
Re: Archive file SRFIs Lassi Kortela (05 Apr 2019 21:47 UTC)
Re: Archive file SRFIs Lassi Kortela (05 Apr 2019 22:17 UTC)
Re: Archive file SRFIs Lassi Kortela (07 Apr 2019 16:53 UTC)
Re: Archive file SRFIs John Cowan (07 Apr 2019 17:54 UTC)

Archive file SRFIs Lassi Kortela 01 Apr 2019 08:49 UTC

Would archive files make a good topic for SRFIs?

I would like to write SRFIs that together define a generic interface for
working with archive files in different formats (zip, tar, etc.) The
implementation would provide zero or more formats, and users could add
their own by importing third-party libraries. The initial plan would be:

* SRFI x - Lossless byte codecs
* SRFI y - Stream access to archives
* SRFI z - Random access to archives

The first SRFI would specify a generic interface to single-file
compressors like gzip and bzip2 as well as binary-to-ASCII schemes like
base64 and uuencode. The idea is that if you can give it arbitrary bytes
to get some encoded bytes, then decode to get back the original bytes
(i.e. a lossless reversible encoding of all possible byte vectors) then
it's fair game. Multi-byte character encodings wouldn't be a good fit
for this API since they convert integers<->bytes rather than
bytes<->bytes. I shouldn't be trusted to design anything crypto-related,
so if someone is good at crypto it would be useful to know whether
bundling file encryption into this same API is a good idea or a bad one
(keys and other pertinent information could be passed as parameters to
the codec).

The second SRFI would specify generic read-archive and write-archive
procedures to read or write an archive file in one go (calling a
procedure for each file in the archive). We can define an archive as any
file whose purpose is to store a group of other named files. Zip
archives compress the files inside them, whereas tar archives just store
the files with compression applied as a separate encoding pass around
the whole archive. The interplay of the interfaces in the first and
second SRFIs would make all combinations of archives and wrap-around
encodings work naturally. It was not simple to find abstractions that
compose elegantly in all cases but I believe I finally have them.

The third SRFI (which I don't want to write, but I can help around if
someone else does) would be like the second, but the interface would
permit random access to the files inside an archive instead of being
restricted to a single sequential sweep through the entire archive.
Reads and writes of the same archive could also be mixed. The interface
would be based around an archive object with open-archive and
close-archive procedures. While the archive is open you could call
procedures in any order to get, add, rename and delete files in it. This
random access interface is much more complex and much less widely useful
than the streaming interfaces. I can't think of a good use for it off
the top of my head other than writing an interactive archive manager
like 7-Zip. This dearth of applications is why I'm not enthusiastic
about it. We may well do without it, perhaps writing it later if there
is demand.

Libarchive (<https://www.libarchive.org/>) and Racket's zip/tar support
have been reference points while I thought about this stuff. If these
SRFIs are accepted, I would like to do an implementation that calls
libarchive through a Scheme's C FFI bindings. I think the libarchive
designers did a great job bringing order to a convoluted problem space
and if the binding is easy to write, that would confirm the soundness of
our SRFIs -- in addition to being useful bindings in their own right. If
there's friction in writing these bindings, we probably specified
something that doesn't serve all levels of abstraction and should go
back to fix our SRFIs.

There's much more to say about many details but it I'd like to find out
whether there is interest before putting in more time. If there is, I'll
upload my draft documents to GitHub for public viewing.

Finally, here is the prior art I found in the Scheme world:

* <https://github.com/weinholt/compression> seems a very nice library.
* <https://github.com/ashinn/chibi-scheme/blob/master/lib/chibi/tar.scm>
* Racket has zip/tar and unzip/untar modules in the standard library.
* Scheme Spheres can read and write tar and gzip.
* Bigloo can read tar and gzip.
* Chicken has an egg to read and write gzip.
* I didn't find archive code in scsh, Kawa, Gauche, Guile or MIT Scheme.
* I didn't find a generic archive interface in any of the above.

KR,
Lassi