8.1 Zipping files: gzip

You can use the commands gzip and gunzip to zip and unzip a file. When you zip a file using gzip, it will automatically receive the ending .gz and it will also appear dark red in the terminal, giving you a visual clue. When you unzip using gunzip, the .gz ending will disappear again.

$ gzip BanthracisProteome.txt
$ ls -sh BanthracisProteome.txt.gz
3.2M BanthracisProteome.txt.gz
$ gunzip -k BanthracisProteome.txt.gz
$ ls -sh BanthracisProteome.txt
24M BanthracisProteome.txt

You can see that as a zipped file, the BanthracisProteome.txt.gz occupies 3.2M, while the unzipped version occupies 24M, that’s more than seven times less space when you zip it!

Note: gzip replaces the original file. Use gzip -k to keep the original file in addition to creating the zipped version. Don’t worry if you forget the -k flag, you can simply unzip your file again, since no information is lost in the process.

The compression is most effective if the file is highly repetitive. You can test this out yourself by creating a file that simply states “Zipping is great!” 10000 times in a row.

$ for i in `seq 1 10000`; do echo "Zipping is great!" >> test.txt; done
$ ls -sh test.txt
176K test.txt
$ gzip test.txt
$ ls -sh test.txt.gz
4.0K test.txt.gz

As you can see, here the size of the file is reduced by more than 40 times!