9.6 Matching: join

The last command that we are going to look at in this section is join, you can use it to match lines of two files based on a common field. For this to work, both files need to have identical join fields (columns). Per default, the join field is the first field delimited by a space.

$ echo -e "Mickey mammal\nDonald bird\nKarlo mammal" > systematics.txt
$ echo -e "Daisy female\nDonald male\nMickey male" > characters.txt
$ join characters.txt systematics.txt
join: systematics.txt:2: is not sorted: Donald bird
Mickey male mammal
join: input is not in sorted order

This did not work, since the join fields (the name of the characters) are not identical. For the command to work, we have to sort the file first:

$ sort systematics.txt > systematics.sorted
$ join characters.txt systematics.sorted
Donald male bird
Mickey male mammal

If the join field is not the first field, you can specify which ones they are using -1 and -2:

$ echo -e "fruit banana\nvegetable carrot\ngrain rice" > foodtypes.txt
$ echo -e "banana yellow\ncarrot orange\nrice white" > foodcolors.txt
$ join -1 2 -2 1 foodtypes.txt foodcolors.txt
banana fruit yellow
carrot vegetable orange
rice grain white

Here, we specified that for the first file (-1), the join column is the second one (2) and for the second file (2) the join column is the first one (1).