Another puzzle: Self-extracting EXE


The rules are the following: 
1)  Before starting the test, you choose the computer + OS you want, etc.
2) Then, I give you a certain file.
3) You need to create a self-extracting EXE for it, that is always smaller than the original file. You can use any loseless compression/decompression algorithm tailored for this specific file. The decompression code might be self-contained in this file but of course it can use routines from the OS.

Question: Is it always possible to create this self-extracting EXE for any file that I give you in step 2)?

Comments (12)

  1. Tad says:

    No. Is that all you need? Do I Win?

  2. David Puggie says:

    No if the file is random data it will not compress. The actual compressed file will actually be larger

  3. No. Simplest counterexample is a zero-byte file 🙂

    If you guarantee that the original file will be larger than a certain size (the size of a minimal exe plus a small amount), you may be able to do it for any file but you might have to use a trick and encode some of the data into the exe’s filename. Whether this is possible depends on whether the filename-length limit on your OS is as long as a minimal exe file. I don’t know Windows filename length limit or enough about the exe file format to know whether this is likely or not.

  4. Adi Oltean says:

    No tricks – let’s assume 8.3 file names for now (and no NTFS extended attributes, or anything similar of course) 🙂

    And yes, the answer is obvious for the zero original file length. How about the other lengths?

  5. Well if the file is already compressed, and cannot be compressed further then adding the extract to extract the already compressed file will end up with a bigger file.

  6. In that case the answer is no, as everyone else already pointed out. Random data in the general case simply can’t be compressed, and it’s impossible to put this into self-extracting exe format without adding length.

  7. Adi Oltean says:

    Correct.

    I was thinking to the following proof: assuming that you want to compress all files of size is N, then all the generated self-extracting EXEs wil have size 0..N-1 (assuming that a zero-size file is still a valid executable – remember that you pick your hardware + OS)

    So, there are 2^N original files.

    Also, there are maximum 1 + 2^1 + 2^2 + … + 2^(N-1) = 2^N – 1 compressed files.

    Therefore there is at least one file which cannot be compressed.

  8. Adi Oltean says:

    (clarification – by size N, I meant N bits, not bytes)

  9. Luke Stevens says:

    No. If you could guarantee that a lossless compression algorithm could always make an output smaller than its input, you could just run the algorithm repeatedly on its own output until it dwindled to nothing but an iteration count. Furthermore, for any input size n you would be mapping 2^n possible inputs uniquely to at most 2*2^m-1 possible outputs where m < n, which is impossible (the pigeonhole principle).

    So, the only way is to remove some of the input data and stuff it somewhere free, like the EXE filename as someone suggested, but I gather any such trick is what you mean to disallow.

  10. Luke Stevens says:

    Wow, we typed the same thing at the same time! 🙂

  11. Stick the file in the resource fork. "ls -l" will show you a zero byte file. You don’t even need to compress it. 🙂

  12. Adi Oltean says:

    >> Stick the file in the resource fork. "ls -l" will show you a zero byte file.

    In Windows, if you use the NTFS file system, you can use alternate file streams to keep "hidden" data.

    Here is an example straight from my CMD.EXE window:

    E:>echo XXXXXXXXXXXXXX > a.txt:hidden

    E:>echo XXXXXXXXXXXXXX > b.txt

    E:>dir *.txt

    Volume in drive E has no label.

    Volume Serial Number is 34AE-5845

    Directory of E:

    11/20/2004 12:22 AM 0 a.txt

    11/20/2004 12:23 AM 17 b.txt

    2 File(s) 17 bytes

    0 Dir(s) 8,305,999,872 bytes free

    E:>more < a.txt:hidden

    XXXXXXXXXXXXXX

    E:>more < b.txt

    XXXXXXXXXXXXXX