Chapter 8 File Compression
The size of text files can usually be reduced through file compression. This is useful, or sometimes even necessary, when you want to backup your data, when you want to transfer files or simply to improve space efficiency on your computer.
Zipped files store information in fewer bits by removing redundant information. Most files are quite redundant, as they have the same information listed over and over again. File compression finds common patterns and replaces them with shortcuts to rid of the redundancy. Instead of listing a piece of information over and over again, a file compression program lists that information once and then refers back to it using a dictionary. For example, consider a text file containing the word “AAAABBBCCCCC”. A simple compression would be to replace the each consecutive letter by the number of times it occurs: “A4B3C5”. That is already much shorter than the original string, and, importantly, we didn’t lose any information by compressing. Therefore, the more repetitive elements your file contains, the larger the relative gain by compression is.
Note: Some examples in this section require the Banthracis proteome BanthracisProteome.txt. You may get the file as follows: