8.3 Looking at zipped files

When you are dealing with a file that is zipped, usually you would unzip it first before performing any operation on it. Depending on the size of your file however, this can be a rather slow process and the bigger your file the slower it will be. Sometimes you just want to look at the content of your file without actually uncompressing it and bash offers the perfect command for this: zcat (called gzcaton a Mac). It unzips the file line by line and prints it to standard output.

$ gzip -k BanthracisProteome.txt
$ zcat BanthracisProteome.txt.gz | head -n1
gzip: BanthracisProteome.txt.gz already exists; do you wish to overwrite (y or n)?  not overwritten
ID   3MGH_BACAN              Reviewed;         205 AA.
$ gunzip -c BanthracisProteome.txt.gz | head -n7 | wc -l
7

In the first example above, we first zipped the BanthracisProteome.txt file and kept the original file unchanged using the -k flag. Then we used zcat and head to look at the first line of the file while the actual file remained zipped. In the second example, you can see that we used the command gunzip -c. This is actually the same command as zcat - zcat is simply a short cut. You can use whichever command you prefer.

Some operations require to unzip the whole file regardless of which command you use. In the example below, we use zcat twice, once to look at the head of the file and then a second time to look at the tail. The command time prints out how long an operation took to execute.

$ time zcat BanthracisProteome.txt.gz | head > /dev/null

real    0m0.006s
user    0m0.001s
sys 0m0.006s
$ time zcat BanthracisProteome.txt.gz | tail > /dev/null

real    0m0.150s
user    0m0.152s
sys 0m0.012s

You can see that it takes longer to look at the tail than at the head, since zcat has to unzip the entire file line by line until it finally arrives at the bottom.

Note: /dev/null is a device file that immediately discards all data it receives, but it reports to you when the operation succeeds.