Favorite command of the day: parallel from moreutils

Yesterday I have discovered a nice Unix tool to launch commands in parallel. It is called ‘parallel’ and it is very easy to use. I think it is the easiest way to parallelize things in a multi-core computer. You can install it from the ‘moreutils’ package in Linux, or from http://www.gnu.org/s/parallel/

the basic usage is:

$: parallel <interpreter> <command> — <list of arguments>

for example

$: parallel bash -c “echo hola” — 1 2 3
hola
hola
hola

This example will launch the “echo hola” command three times in parallel, one for each argument after the ‘–‘.
You can use the command “htop” to monitor CPU usage.

Thanks to this command, it is very easy to launch a great number of jobs in parallel. For example, if I want to run 1000 simulations:

$: parallel perl launch_a_single_simulation.pl — {1..1000}

This will run 1000 simulations in parallel, making use of as many processors as available.

By using the -i option, it is also possible to pass the values of the arguments after the ‘–‘ to the script.

$: parallel i bash -c “echo hola {}” — Johannes Marc Pierre Manu
hola Marc
hola Johannes
hola Manu
hola Pierre

When using the -i option, the symbol ‘{}’ is replaced by the argument.

For example, if we want to run a job on all chromosomes, we can just say:

$: parallel -i python calculate_test_on_chromosome.py {} — {1..22} X Y

Or, if we want to execute a script for many genes, we can say:

$: parallel -i python get_plot_by_gene –gene {} — ALG12 MGAT3 DOLPP1

Have fun 🙂