9.3 sed

sed is a yet another powerful stream editor for filtering and transforming text. In contrast to tr, by default sed is applied line by line, so depending on the task that you perform it might make more sense to use one over the other.

The syntax of sed can feel weird at first, however it’s such a powerful command that learning to use it will pay off very soon! sed is most often used to substitute (find and replace) characters or words in a line. The basic syntax for this is:

sed 's/from/to/' inputFileName > outputFileName or

sed 's/from/to/g' inputFileName > outputFileName or

sed 'y/from/to/' inputFileName > outputFileName

There are a couple of things going on here, first you notice that the entire command after sed is written in quotes ', this is part of the syntax and bash will through an error if you don’t include them. Then, we either have the letter ‘s’ or the letter ‘y’ in the beginning of the command. The ‘s’ stands for substitute and the ‘y’ stands for translating. What is the difference? When you use the ‘s’ option, the entire pattern that you provide will be replaced by your replacement string. Let’s make an example:

$ for i in `seq 1 3`; do echo "hello there, line $i" >> sed.txt; done
$ sed 's/there/you/' sed.txt
hello you, line 1
hello you, line 2
hello you, line 3

You can see that sed replaced every word ‘there’ with the word ‘you’ as specified in the command.

The ‘y’ option on the other hand translates individual characters (similar to tr):

$ sed 'y/eoi/XYZ/' sed.txt
hXllY thXrX, lZnX 1
hXllY thXrX, lZnX 2
hXllY thXrX, lZnX 3

Here, every ‘e’ has been translated (replaced) to an ‘X’, every ‘o’ to an ‘Y’ and every ‘i’ to a ‘Z’, so there is no need for ‘eoi’ to occur together as one word

You probably noticed the ‘g’ at the end of one of the sed commands in the examples in the beginning of this section. The ‘g’ stands for global and you can chose to include it or not. Why would you want to include it? Since sed works line by line, it will read a line until it meets the pattern that you are looking for, replace that one occurrence and then move on to the next line. So, in case you want ALL occurrences replaced (not just the first one in the line), then you have to specify the global option.

Let’s look at an example using the global option. Say we want to replace every small ‘l’ with a capital ‘L’ in our file:

$ sed 's/l/L/' sed.txt
heLlo there, line 1
heLlo there, line 2
heLlo there, line 3

As you see, if we do not provide the global option it will only replace the first instance of every line. Now, let’s add the global option:

$ sed 's/l/L/g' sed.txt
heLLo there, Line 1
heLLo there, Line 2
heLLo there, Line 3

Perfect, now we successfully replaced all small l’s with capital L’s.

Note In this example, you could use the ‘y’ instead of the ‘s’ option, then there would be no need to set the global option, since ‘y’ translates all characters that match the pattern.

Here are some more examples of sed using the file that you are already familiar with:

A few more examples:

Translating characters: sed 'y/from/to/'

$ head -n1 BanthracisProteome.txt
ID   3MGH_BACAN              Reviewed;         205 AA.
$ head -n1 BanthracisProteome.txt | sed 'y/AI/ai/'
iD   3MGH_BaCaN              Reviewed;         205 aa.
$ head -n2 BanthracisProteome.txt | sed 'y/\n/ /'
ID   3MGH_BACAN              Reviewed;         205 AA.
AC   Q81UJ9; Q6I2S8; Q6KWK9;

In the example above you can see that sed can not replace newlines, since it is applied line by line.

Replacing text: sed 's/from/to/

$ head -n1 BanthracisProteome.txt | sed 's/BACAN/FRENCH FRIES/'
ID   3MGH_FRENCH FRIES              Reviewed;         205 AA.
$ head -n1 BanthracisProteome.txt | sed 's/A/a/'
ID   3MGH_BaCAN              Reviewed;         205 AA.
$ head -n1 BanthracisProteome.txt | sed 's/A/a/g'
ID   3MGH_BaCaN              Reviewed;         205 aa.

Note: Any character other than backslash or newline can be used instead of a slash to delimit the pattern and the replacement. Within the pattern and the replacement, the chosen delimiter itself can be used as a literal character, but you have to precede it with a backslash:

$ echo 'bununu' | sed 'saua\aag'
banana

In the example above, the letter ‘a’ was chosen as a delimiter, but at the same time we wanted to replace every ‘u’ with an ‘a’ as well, so we had to precede the replacement letter ‘a’ with a backslash, otherwise it wouldn’t work.

sed also allows to apply commands to specific lines only:

$ head -2 BanthracisProteome.txt | sed '2 s/A/a/'
ID   3MGH_BACAN              Reviewed;         205 AA.
aC   Q81UJ9; Q6I2S8; Q6KWK9;
$ head -10 BanthracisProteome.txt | sed '2,10 s/A/a/'
ID   3MGH_BACAN              Reviewed;         205 AA.
aC   Q81UJ9; Q6I2S8; Q6KWK9;
DT   26-aPR-2004, integrated into UniProtKB/Swiss-Prot.
DT   01-JUN-2003, sequence version 1.
DT   13-NOV-2013, entry version 69.
DE   RecName: Full=Putative 3-methyladenine DNa glycosylase;
DE            EC=3.2.2.-;
GN   OrderedLocusNames=Ba_0869, GBAA_0869, BAS0826;
OS   Bacillus anthracis.
OC   Bacteria; Firmicutes; Bacilli; Bacillales; Bacillaceae; Bacillus;

Note: to indicate a range until the end of a file, use $: sed '10,$'.

sed can do a lot more things such as deleting, replacing or adding lines. But all these tasks can also be achieved with awk, which has an easier syntax. Acrtully, for most of the things you want to achieve there exist multiple ways using different commands. It is up to you to find out which ones suit you best. Some commands will be more applicable or efficient for certain tasks, but often there is no right or wrong.