11.1 Exercises

Note: Some of these exercises require the Banthracis proteome BanthracisProteome.txt. You may get the file as follows:

$ # download the zipped file using wget
$ wget --compression=auto https://bitbucket.org/wegmannlabteaching/bash_lecture/raw/master/Files/BanthracisProteome.txt.gz
$ # unzip the file
$ gunzip BanthracisProteome.txt.gz

11.1.1 R and bash

See Section 12.0.19 for solutions.

  1. Use R to print 1000 normally distributed random numbers with specific mean µ=10 to STDOUT and use awk to calculate their mean.

  2. Write a bash script that automatically downloads the current temperature at Swiss meteo stations and plots them as a function of the altitude at which they are measured. Use wget URL to download the data from here, then use cut to extract the relevant columns and R to plot to pdf. The script should further print the current temperature in Fribourg to the console and delete the downloaded file, as well as intermediate files.

  3. Use awk and R to plot the protein length in amino acids (AA) against their molecular weight of all proteins in BanthracisProteome.txt (the info is given on lines starting with SQ).

  4. Write a bash script that uses awk to extract the protein length (number of AA) of all proteins with a particular GO term (see lines starting with DR that have ”GO;“ in the second column) and R to plot it as a histogram to a pdf containing the GO Term as file name. Use a for loop to create such a histogram for GO:0005886, GO:0005737, GO:0003677, GO:0005524 and GO:0016021.