vendredi 22 avril 2022

Why do some compression algorithms not change the size of the archive file, and some do?

This is a list of file sizes that were compressed. They are all originally from \dev\urandom, thus pretty pseudo random. The yellow file is the original size of any and all of them.

The pink files are the compressed sizes, and there are 30 in total. The blue files are the compressed sizes of the pink files once they have been randomly shuffled using Python's shuffle command. The compression algorithms were LZMA, BZ2 and ZLIB respectively from Python's inbuilt modules.

You can see that there are three general blocks of output sizes. Two blocks have a size difference of exactly zero, whilst the other block has randomnesque file size differences as I'd expect due to arbitrary runs and correlations that appear when files as randomly shuffled. BZ2 files have changed by up to 930 bytes. I find it mystifying that LZMA and ZLIB algorithms produced exactly the same filesizes. It's as if there is some code inside specifically looking for no effect.

Q: Why do the BZ2 files significantly change size, whilst shuffling LZMA and ZLIB files has no effect on their compressibility.

sheet




Aucun commentaire:

Enregistrer un commentaire