8.2 Compressing multiple files: tar

The gzip command that we learned in the previous section is only able to zip individual files. If you want to zip more than one file, you first have to put them in a so-called archive. The word archive is referring to the process of combining multiple files (and also directories if you want) into one single file. To archive files, you can use the command tar -cf. The option -f is to write the output to a file and not STDOUT (i.e. standard output), -c stands for create and -z zips all files before putting them into the archive. So as you can see, you don’t necessarily have to zip the files when you add them to an archive, but you have the option to do so if you want.

$ tar -cf all.tar BanthracisProteome.txt BanthracisProteome.txt.gz test.txt.gz
$ ls -sh all.tar BanthracisProteome.txt BanthracisProteome.txt.gz test.txt.gz
 27M all.tar
 24M BanthracisProteome.txt
3.2M BanthracisProteome.txt.gz
4.0K test.txt.gz
$ rm all.tar
$ tar -czf all.tar BanthracisProteome.txt BanthracisProteome.txt.gz test.txt.gz
$ ls -sh all.tar BanthracisProteome.txt BanthracisProteome.txt.gz test.txt.gz
6.4M all.tar
 24M BanthracisProteome.txt
3.2M BanthracisProteome.txt.gz
4.0K test.txt.gz

Again, you can see that the size of the archive is greatly reduced if you decide to add the -z flag that zips all the files that go into it.

Now you know how to create an archive, but what if you already have an archive and you want to extract the files in it? You can do this using the tar -xf command. The -x stands for extract and for the command to work, you also need to add the -f, otherwise no file will be written:

$ mkdir -p out
$ cd out
$ tar -xf ../all.tar
$ ls
BanthracisProteome.txt
BanthracisProteome.txt.gz
test.txt.gz

Remove the folder again:

$ rm -r out