.NET System.IO.Compression and zip files

DotNetZip Library

.NET 2.0 does Compression

[Update 30 October 2007]: I moved this library to a CodePlex project, called DotNetZip. See www.codeplex.com/DotNetZip. It does zip creation, extraction, passwords, ZIP64, Unicode, SFX, and more. It is open source, free Free FREE to use, has a clear license, and comes with .NET-based ZIP utilities. It works on the Compact Framework or the regular .NET Framework. It is not the same as #ziplib or SharpZipLib. DotNetZip is independent.

There's a new namespace in the .NET Framework base class library for .NET 2.0, called System.IO.Compression. It has classes called DeflateStream and GZipStream.

These classes are streams; they're useful for compressing a stream of bytes as you transfer it, for example across the network to a cooperating application (a peer, or a client, whatever). The DeflateStream implements the Deflate algorithm, see the IETF's RFC 1951. "DEFLATE Compressed Data Format Specification version 1.3." The GZipStream is an elaboration of the Deflate algorithm, and adds a cyclic-redundancy-check. For more on GZip, see the IETF RFC 1952, "Gzip".

Gzip has been done

The GZip format described in RFC 1952 is also used by the popular gzip utility included in many *nix distributions. The Base Class Library team at Microsoft previously published example source code for a simple utility that behaves just like the *nix gzip, but is written in .NET and based on the GZipStream class. This simple utility can interoperate with the *nix gzip, can read and write .gz files.

What about .zip files?

As a companion to that example, enclosed here as an attachment (see the bottom of this post) is an example class than can read and write zip archives. It is packaged as a re-usable library, as well as a couple of companion example command-line applications that use the library. The example apps are useful on their own, for example for just zipping up a directory quickly, from within a script or a command-prompt. But the library will be useful also, for including zip capability into arbitrary applications. For example, you could include a zip task in a msbuild session, or into a smart-client GUI application. I've included both the binaries and source code here.

This is the class diagram for the ZipFile class, and the ZipEntry class, as generated by Visual Studio 2005. The ZipFile is the main class.

If you don't quite grok all that notation, I will point out a few highlights. The ZipFile itself supports a generic IEnumerable interface. What this means is you can enumerate the ZipEntry's within the ZipFile using a foreach loop. Makes usage really simple. ( Implementing that little trick is also dead-simple, thanks to the new-for-2.0 support for iterators in C# 2.0, and the "yield return" statement.)

Using the ZipFile class

You can extract all files from an existing .zip file by doing this:

ZipFile zip = ZipFile.Read("MyZip.zip");

foreach (ZipEntry e in zip)

{

e.Extract("NewDirectory");

}

Of course, you don't have want to extract the files, you can just fiddle with the properties on the ZipEntry things in the collection. Creating a new .zip file is also simple:

ZipFile zip= new ZipFile("MyNewZip.zip");

zip.AddDirectory("My Pictures", true); // AddDirectory recurses subdirectories

zip.Save();

You can add a directory at a time, as shown above, and you can add individual files as well. It seems to be pretty fast, though I haven't benchmarked it. It doesn't compress as much as winzip; This library is at the mercy of the DeflateStream class, and that class doesn't support multiple levels of compression.

Hmmm, What About Intellectual Property?

I am no lawyer, but it seems to me the ZIP format is PKware's intellectual property. PKWare has some text in their zip spec which states:

PKWARE is committed to the interoperability and advancement of the .ZIP format. PKWARE offers a free license for certain technological aspects described above under certain restrictions and conditions. However, the use or implementation in a product of certain technological aspects set forth in the current APPNOTE, including those with regard to strong encryption or patching, requires a license from PKWARE. Please contact PKWARE with regard to acquiring a license.

I checked with pkware for more on that. I described what I was doing with this example, and got a nice email reply from Jim Peterson at PKWare, who wrote:

From the description of your intended need, no license would be necessary for the compression/decompression features you plan to use.

Which would mean, anyone could use this example without a license. But like I said, I am no lawyer.

Later,

-Dino

[Update 11 April 2006 1036am US Pacific time]: After a bit of testing it seems that there are some anomalies with the DeflateStream class in .NET. One of them is, it performs badly with already compressed data. The DeflateStream in .NET can actually Inflate the size of the stream. The output is still a valid Deflate stream, but it isn't compressed as you'd like. The DotNetZip implementation works around this by using the STORE method rather than DEFLATE when data size increases. But still....

The base class library team is aware of this anomaly and is considering it. If you'd like to weigh in on this behavior, and I encourage you to do so if you value this class, use the Product Feedback Center, see here.