The operator module in python implements many functions of common use in C, making them faster.

Today I had to extract a very big number of positions from a long sequence of DNA, and the operator.itemgetter solved my problem quickly.

Imagine that you have a very long sequence:

1 2 |
>seq1 ACGACTGATCGATCGATCGATGCATCGATCGACGAT.... (up to millions of bases long) |

you have to extract a series of specific positions, for example 4, 6, 123, 231… How would you do it in python? The most intuitive way would be to repeat the slice operation for every position, like seq1[4], seq1[6], seq1[123]…

Luckly, there with operator.itemgetter you can do it in a single operation, and it is quite fast:

1 2 3 4 5 6 |
>>> import operator >>> sequence = "ACGACTGATCGATCGATCGATGCATCGATCGACGAT" >>> random_positions = random.sample(xrange(len(sequence)), 30) >>> get_positions = operator.itemgetter(*random_positions) >>> get_positions(sequence) ('T', 'C', 'G', 'C', 'A', 'C', 'C', 'T', 'A', 'T', 'G', 'T', 'A', 'T', 'C', 'C', 'T', 'T', 'A', 'G', 'T', 'A', 'A', 'A', 'C', 'G', 'G', 'C', 'G', 'A') |

What is amazing is that I have tried this operation on the real sequences, which the entire human genome, and on the real positions, which were some millions as well. I was able to extract a million of positions in sequences of millions of bases in a very few seconds!!

Operator is also great for sorting multi dimensional x,y tables like:

a.sort(key=operator.itemgetter(1))

where 1 is x’s value.

Thanks!