operator.itemgetter rocks!!

The operator module in python implements many functions of common use in C, making them faster.

Today I had to extract a very big number of positions from a long sequence of DNA, and the operator.itemgetter solved my problem quickly.

Imagine that you have a very long sequence:

you have to extract a series of specific positions, for example 4, 6, 123, 231… How would you do it in python? The most intuitive way would be to repeat the slice operation for every position, like seq1[4], seq1[6], seq1[123]…

Luckly, there with operator.itemgetter you can do it in a single operation, and it is quite fast:

What is amazing is that I have tried this operation on the real sequences, which the entire human genome, and on the real positions, which were some millions as well. I was able to extract a million of positions in sequences of millions of bases in a very few seconds!!

a seminar on makefile and pipelines of shell scripts

While I still haven’t had the time to restore the old posts in this blog, I would like to post again my seminar on makefiles.

This slideshow is different from all the others you can find on makefiles, because instead of showing you how to use it to compile programs, it shows you how make can be used to create pipelines of shell programs and scripts, which is very useful in bioinformatics and in other fields.

Let’s say you have a lot of scripts to analyze the results of an experiment: for example, one to launch blast, another to parse its output, to compare it with other databases, to run command line programs… or just to organize a bundle of sed/grep/gawk scripts.

A makefile can be used to store complex commands like that and organize them in a pipeline: for example, the operation ‘run_blast’ consists in running blast and parsing its results; and the ‘analyze_results’ consists in a series of sed and gawk scripts, along with an R one. I have seen many people using shell scripts to do so, but the best approach is to use a language designed to describe pipelines: this is what Make is, one of the oldest (yet used widely) languages to define pipelines, so it is good to start learning with it.

Another difference with respect to other seminars on makefiles is that I have tried to start with a ‘reduced makefile syntax’, in which you just use the name of the rule, the prerequisites, and the commands, without worrying that the name of the rule corresponding to the name of an output file. If you prefer to know about the standard approach, I suggest to start with reading the corresponding section on software carpentry for bioinformatics.

how to read and plot a CSV file with python and pylab

Today I gave a small introductive talk to how to read a CSV file and create some plots with python and pylab.

Here you have the slides:

I have uploaded them on slideshare and created a blog post here, as I used to do in my former poor and once-glorious blog 🙁

Starting from 3

Dear readers of this blog,

I have decided to start writing new posts and articles in this blog, and temporanely abandon the idea of restoring the old contents, which I will try do to when I will have more spare time.

The responsability of the blackout of this site is both mine, because I haven’t been very keen on fixing it, and of my host provider, which changed the conditions of the contract too quickly without giving me enough time to make a proper backup and organize a migration.

Now I have a raw SQL file containing all the data in the previous blog, but because of a lack of time I didn’t manage yet to use it to restore the previous articles, so for the moment I will start writing new posts and restore the old contents as soon as possible.

I am sorry to have left you few readers of this blog for so much time, without giving any explanations nor trying to restore it actively 🙂 I hope I will be able to do better in the future.

Cheers, and break a leg for your plans of conquering the world with bioinformatics 🙂

Recruiting mentors for MindTorch

I have a small announcement to make. In the last months I became involved in MindTorch, a London based start-up that aims at matching students and young researchers with potential career mentors. It is important for young people to have a mentor or a person of reference to advise them, telling them how to invest in their future and which mistakes to avoid. MindTorch aims at helping people finding exactly that, by providing a community where everyone can find a mentor.

Screen Shot 2015-07-20 at 17.35.32

We are currently in the phase of recruiting mentors – that is, professional or researchers with good experience and who would be willing to dedicate some time to help and counsel younger people. Every mentor is supposed to dedicate one hour every month to their mentee, for three to six months and starting from next October/November, plus some initial time to communicate with us. So, if you would like to volunteer as a mentor for MindTorch, contact me or register as a mentor on the website.

In summary:

Do I have enough experience to mentor someone? If you have a degree and job experience you can certainly be a good mentor. We will train you and support any doubts you may have.

How much time and effort will it take? You will have to dedicate one hour every month to counsel a younger student, for three to six months.

What do I get from being a mentor? For the moment we are not planning on giving any monetary retribution to mentors, but you will get to learn a lot from the experience and have the opportunity to grow your network of contacts. We will also help you finding mentors and new contacts for your own career.

How to join: register as a mentor on the website.