Archive file SRFIs Lassi Kortela (01 Apr 2019 08:49 UTC)
|
Re: Archive file SRFIs
Arthur A. Gleckler
(01 Apr 2019 15:05 UTC)
|
Re: Archive file SRFIs
Lassi Kortela
(01 Apr 2019 16:29 UTC)
|
Re: Archive file SRFIs
John Cowan
(01 Apr 2019 16:47 UTC)
|
Re: Archive file SRFIs
Lassi Kortela
(01 Apr 2019 17:38 UTC)
|
Re: Archive file SRFIs
Göran Weinholt
(01 Apr 2019 20:14 UTC)
|
Re: Archive file SRFIs
Lassi Kortela
(05 Apr 2019 21:47 UTC)
|
Re: Archive file SRFIs
Lassi Kortela
(05 Apr 2019 22:17 UTC)
|
Re: Archive file SRFIs
Lassi Kortela
(07 Apr 2019 16:53 UTC)
|
Re: Archive file SRFIs
John Cowan
(07 Apr 2019 17:54 UTC)
|
Would archive files make a good topic for SRFIs? I would like to write SRFIs that together define a generic interface for working with archive files in different formats (zip, tar, etc.) The implementation would provide zero or more formats, and users could add their own by importing third-party libraries. The initial plan would be: * SRFI x - Lossless byte codecs * SRFI y - Stream access to archives * SRFI z - Random access to archives The first SRFI would specify a generic interface to single-file compressors like gzip and bzip2 as well as binary-to-ASCII schemes like base64 and uuencode. The idea is that if you can give it arbitrary bytes to get some encoded bytes, then decode to get back the original bytes (i.e. a lossless reversible encoding of all possible byte vectors) then it's fair game. Multi-byte character encodings wouldn't be a good fit for this API since they convert integers<->bytes rather than bytes<->bytes. I shouldn't be trusted to design anything crypto-related, so if someone is good at crypto it would be useful to know whether bundling file encryption into this same API is a good idea or a bad one (keys and other pertinent information could be passed as parameters to the codec). The second SRFI would specify generic read-archive and write-archive procedures to read or write an archive file in one go (calling a procedure for each file in the archive). We can define an archive as any file whose purpose is to store a group of other named files. Zip archives compress the files inside them, whereas tar archives just store the files with compression applied as a separate encoding pass around the whole archive. The interplay of the interfaces in the first and second SRFIs would make all combinations of archives and wrap-around encodings work naturally. It was not simple to find abstractions that compose elegantly in all cases but I believe I finally have them. The third SRFI (which I don't want to write, but I can help around if someone else does) would be like the second, but the interface would permit random access to the files inside an archive instead of being restricted to a single sequential sweep through the entire archive. Reads and writes of the same archive could also be mixed. The interface would be based around an archive object with open-archive and close-archive procedures. While the archive is open you could call procedures in any order to get, add, rename and delete files in it. This random access interface is much more complex and much less widely useful than the streaming interfaces. I can't think of a good use for it off the top of my head other than writing an interactive archive manager like 7-Zip. This dearth of applications is why I'm not enthusiastic about it. We may well do without it, perhaps writing it later if there is demand. Libarchive (<https://www.libarchive.org/>) and Racket's zip/tar support have been reference points while I thought about this stuff. If these SRFIs are accepted, I would like to do an implementation that calls libarchive through a Scheme's C FFI bindings. I think the libarchive designers did a great job bringing order to a convoluted problem space and if the binding is easy to write, that would confirm the soundness of our SRFIs -- in addition to being useful bindings in their own right. If there's friction in writing these bindings, we probably specified something that doesn't serve all levels of abstraction and should go back to fix our SRFIs. There's much more to say about many details but it I'd like to find out whether there is interest before putting in more time. If there is, I'll upload my draft documents to GitHub for public viewing. Finally, here is the prior art I found in the Scheme world: * <https://github.com/weinholt/compression> seems a very nice library. * <https://github.com/ashinn/chibi-scheme/blob/master/lib/chibi/tar.scm> * Racket has zip/tar and unzip/untar modules in the standard library. * Scheme Spheres can read and write tar and gzip. * Bigloo can read tar and gzip. * Chicken has an egg to read and write gzip. * I didn't find archive code in scsh, Kawa, Gauche, Guile or MIT Scheme. * I didn't find a generic archive interface in any of the above. KR, Lassi