.NET System.IO.Compression and zip files



DotNetZip Library


.NET 2.0 does Compression


[Update 30 October 2007]: I moved this library to a CodePlex project, called DotNetZip. See www.codeplex.com/DotNetZip.  It does zip creation, extraction, passwords, ZIP64, Unicode, SFX, and more. It is open source, free Free FREE to use, has a clear license, and comes with .NET-based ZIP utilities. It works on the Compact Framework or the regular .NET Framework.  It is not the same as #ziplib or SharpZipLib.  DotNetZip is independent.


There’s a new namespace in the .NET Framework base class library for .NET 2.0, called System.IO.Compression. It has classes called DeflateStream and GZipStream.


These classes are streams; they’re useful for compressing a stream of bytes as you transfer it, for example across the network to a cooperating application (a peer, or a client, whatever). The DeflateStream implements the Deflate algorithm, see the IETF’s RFC 1951. “DEFLATE Compressed Data Format Specification version 1.3.” The GZipStream is an elaboration of the Deflate algorithm, and adds a cyclic-redundancy-check. For more on GZip, see the IETF RFC 1952, “Gzip”.


Gzip has been done


The GZip format described in RFC 1952 is also used by the popular gzip utility included in many *nix distributions. The Base Class Library team at Microsoft previously published example source code for a simple utility that behaves just like the *nix gzip, but is written in .NET and based on the GZipStream class. This simple utility can interoperate with the *nix gzip, can read and write .gz files.


What about .zip files?


As a companion to that example, enclosed here as an attachment (see the bottom of this post) is an example class than can read and write zip archives. It is packaged as a re-usable library, as well as a couple of companion example command-line applications that use the library. The example apps are useful on their own, for example for just zipping up a directory quickly, from within a script or a command-prompt. But the library will be useful also, for including zip capability into arbitrary applications. For example, you could include a zip task in a msbuild session, or into a smart-client GUI application. I’ve included both the binaries and source code here.


This is the class diagram for the ZipFile class, and the ZipEntry class, as generated by Visual Studio 2005. The ZipFile is the main class.


If you don’t quite grok all that notation, I will point out a few highlights. The ZipFile itself supports a generic IEnumerable interface. What this means is you can enumerate the ZipEntry’s within the ZipFile using a foreach loop. Makes usage really simple. ( Implementing that little trick is also dead-simple, thanks to the new-for-2.0 support for iterators in C# 2.0, and the “yield return” statement.)


Using the ZipFile class


You can extract all files from an existing .zip file by doing this:



        ZipFile zip = ZipFile.Read(“MyZip.zip”);


        foreach (ZipEntry e in zip)


        {


            e.Extract(“NewDirectory”);


        }


Of course, you don’t have want to extract the files, you can just fiddle with the properties on the ZipEntry things in the collection. Creating a new .zip file is also simple:



      ZipFile zip= new ZipFile(“MyNewZip.zip”);


      zip.AddDirectory(“My Pictures”, true); // AddDirectory recurses subdirectories


      zip.Save();


You can add a directory at a time, as shown above, and you can add individual files as well. It seems to be pretty fast, though I haven’t benchmarked it. It doesn’t compress as much as winzip; This library is at the mercy of the DeflateStream class, and that class doesn’t support multiple levels of compression.


Hmmm, What About Intellectual Property?


I am no lawyer, but it seems to me the ZIP format is PKware’s intellectual property. PKWare has some text in their zip spec which states:


PKWARE is committed to the interoperability and advancement of the .ZIP format. PKWARE offers a free license for certain technological aspects described above under certain restrictions and conditions. However, the use or implementation in a product of certain technological aspects set forth in the current APPNOTE, including those with regard to strong encryption or patching, requires a license from PKWARE. Please contact PKWARE with regard to acquiring a license.

I checked with pkware for more on that.  I described what I was doing with this example, and got a nice email reply from Jim Peterson at PKWare, who wrote:


From the description of your intended need, no license would be necessary for the compression/decompression features you plan to use.

Which would mean, anyone could use this example without a license. But like I said, I am no lawyer.


Later,


-Dino


[Update 11 April 2006 1036am US Pacific time]: After a bit of testing it seems that there are some anomalies with the DeflateStream class in .NET. One of them is, it performs badly with already compressed data. The DeflateStream in .NET can actually Inflate the size of the stream. The output is still a valid Deflate stream, but it isn’t compressed as you’d like. The DotNetZip implementation works around this by using the STORE method rather than DEFLATE when data size increases.  But still….

The base class library team is aware of this anomaly and is considering it. If you’d like to weigh in on this behavior, and I encourage you to do so if you value this class, use the Product Feedback Center, see here.


 

Comments (21)

  1. Tim Heron says:

    It’s a pity that GZipStream doesn’t work with streams over 4GB in size.  GNU gzip can cope with >4GB files so why this limitation ? http://www.gzip.org/#faq10

  2. CedarLogic says:

    .NET System.IO.Compression and zip files .NET Zip Library.NET 2.0 does CompressionThere’s a new namespace…

  3. TravisOwens says:

    Don’t get PKZIP and GZIP confused, while PKZIP came about 2-3yrs before GZIP, GZIP is not a *nix implementation of PKZIP, although both deflators support each other’s format.

    If .Net is using GZIP’s method (which fully works in PKZIP, WinZip, etc) then the licensing is a non issue anyways.

  4. Jeff Parker says:

    Ohhhh, brilliant I was looking for something like this the other day when I was playing in the compression namespace.

  5. cheeso says:

    Travis, thanks for reminding us all that Gzip and Pkzip are different.  I should have pointed that out.  Both Gzip (the *nix utility) and Pkzip (the commercial tool) do standard compression (see the IETF RFC’s mentioned in the original entry).  But Gzip compresses a single file, and pkzip builds compressed archives.

    I think you are jumping to conclusions when you suggest that because .NET’s compression library uses the Deflate algorithm, there are no IP issues.  PKWARE defines the format for .zip files, and that format is theirs to license.  They don’t have a license on the compression format, but on the surrounding data that describes the multi-file archive.

    I contacted PKWARE and they agreed that the usage here is covered under their "free" license terms.  But it is still PKWARE’s intellectual property, and it is still a license, though I did not pay for it.  Keep in mind, I am not a lawyer.

    -Dino

  6. cheeso says:

    Tim, I don’t know about the 4GB limit – if it is real, and if so, why it is there.  

    I would suggest posting to http://forums.microsoft.com/MSDN/showforum.aspx?forumid=39&siteid=1

    -Dino

  7. Linkliste 08.04.2006

    Software
    HFS – Http File Server – Ein kleiner Fileserver der keine Installation benötigt. Sourcecode ist auch verfügbar. [via Portable Freeware]

    .Net
    .NET System.IO.Compression and zip files – Eine Zip Library basierend

  8. Colin says:

    Hi

    I have managed to implement the zipping of a file, but when I close down my Form, I get a

    "System.MissingMethodException". This seems to relate to the zip.Dispose method.

    Any ideas on how I can correct this?

    Many thanks

    Colin

  9. Finding a way to use system.io.compression for zip archives

  10. Jon Galloway says:

    Overview SharpZipLib provides best free .NET compression library, but what if you can’t use it due to

  11. Mohan says:

    Hai

    i have used it in my code but have a problem with it. my folder size before zipping is 599 kb and after zipping is 998 kb so what is the way to zip  it in  a way  to decrease the file size

  12. cheeso says:

    Mohan – What version of the Zip Library are you using?  You will want to get the latest version of this library, from http://www.codeplex.com/DotNetZip.  It corrects the problem where some files get "inflated" when they are zipped

  13. john says:

    the 4gig limit is probably due to the physical memory limitations of addressing space in the system.

    IE the .net framework is not designed to touch the disk whilst it compresses

    If you wanted to exceed this then you would have to write a pagefile like system to store processed data whilst using the 4 gig as a buffer

  14. cheeso says:

    @john, The 4g limit mentioned above has nothing to do with the physical memory of the machine.  It is related to the DeflateStream implementation.  I haven’t explored it well, so

    I cannot say more than that.

    it does not have to do with whether the implementation is streaming or not (viz, "not designed to touch the disk while compressing").  

  15. David Taylor says:

    Hi – Just came across your blog (a few years late) and thought I would address some people’s questions about the 4 GB limit.

    The actualy IETF GZIP specification states that the length field in the header is the modulus (the remainder after dividing by 4 GB) of the number of bytes compressed.  

    So if the uncompressed length was 5 GB, via the spec that 32 bit field should be set to 1 GB (5 GB MOD 1 GB).  

    It appears the Microsoft implementation has a bug where they did not correctly read the spec and assumed the 32 bit field meant only 4 GB files or less should be supported.  

    The spec is a bit funny in that if you are encoding a file larger than 4 GB the only way to know the true length of the uncompressed file is to actually uncompress it!

    I found this out the hard way after compressing large files a few years back and ended up reading the IETF spec – noting that microsoft’s implementation bug …. I ended up being forced to move to SharpZipLib which does not have this bug.

    I am not sure if Microsoft has since fixed this bug (after .NET 2)?  But hope this information helps anyone reading this thread.  It only impacts you if you deal with very large files.

    David

  16. feliz says:

    The Microsoft-provided Deflate implementation uses a fixed Huffman tree that is optimized for ASCII text only. For this reason, compressing anything other than text may even increase the overall file size.

  17. Martin Vobr says:

    You may also try our Rebex ZIP from http://www.rebex.net/zip.net/ . It does not depend on .NET Deflate implementation and does not have a 4GB limit.

  18. http://hsscore.codeplex.com/ also has ZIP file/folder compression based on this lib

  19. Dave says:

    I am running vb.net express 2008 This is a total waste of time