a seminar on makefile and pipelines of shell scripts

While I still haven’t had the time to restore the old posts in this blog, I would like to post again my seminar on makefiles.

This slideshow is different from all the others you can find on makefiles, because instead of showing you how to use it to compile programs, it shows you how make can be used to create pipelines of shell programs and scripts, which is very useful in bioinformatics and in other fields.

Let’s say you have a lot of scripts to analyze the results of an experiment: for example, one to launch blast, another to parse its output, to compare it with other databases, to run command line programs… or just to organize a bundle of sed/grep/gawk scripts.

A makefile can be used to store complex commands like that and organize them in a pipeline: for example, the operation ‘run_blast’ consists in running blast and parsing its results; and the ‘analyze_results’ consists in a series of sed and gawk scripts, along with an R one. I have seen many people using shell scripts to do so, but the best approach is to use a language designed to describe pipelines: this is what Make is, one of the oldest (yet used widely) languages to define pipelines, so it is good to start learning with it.

Another difference with respect to other seminars on makefiles is that I have tried to start with a ‘reduced makefile syntax’, in which you just use the name of the rule, the prerequisites, and the commands, without worrying that the name of the rule corresponding to the name of an output file. If you prefer to know about the standard approach, I suggest to start with reading the corresponding section on software carpentry for bioinformatics.

Leave a Reply