farewell to Barcelona

I am posting this a bit late (since I already moved 6 months ago); anyway, the news is that I left my lab in Barcelona, and moved to London!

prbb from avda aiguader
The institute where I did my PhD: the PRBB in Barcelona. On the other side of the building there is the beach.

I am satisfied about my time in Barcelona, where I did my master thesis and my PhD on network theory applied to human population genetics. However, it was time to move and try new experiences.

a picture taken from the 4th floor of the PRBB
a picture taken from the 4th floor of the PRBB

Apart from the change of city, I also changed my field of work, as I am now working on cancer genetics. My new group is a young group recently moved from Italy to London, famous for research on the systems-level properties of cancer genes , for a database called Network of Cancer Genes, and involved in a consortium for the sequencing of hepatocellular carcinoma. I will keep you informed of the proceedings!

my first PyPI package: vcf2networks

My first Python package is in PyPI!! I guess that now I can officially call myself a python programmer.

VCF2Networks is a python script to calculate genotype networks from population genetics data. Genotype networks are a method used in systems biology to study the “innovability” of a given phenotype, by representing all the genotypes associated with the phenotype as a graph, and studying some properties of this graph, such as the average path length and the average degree. For more info, you can look at the slides of the “Origins of Evolutionary Innovations” book club in this blog. The script in VCF2Networks allows to take any dataset of genotypes stored in the VCF format, and calculate many of these properties.

In principle, I am planning to submit an application note about the script to a bioinformatics-oriented journal. So, if you have some little time to lent me, and you want to test it, any feedback will be very useful for me. At the moment, the major issue is to simplify the installation, because this package depends on numpy and python-igraph, and these two modules require some terrible C libraries that must be installed separately. If you are aware of any way to distribute a binary package of a python module that depends on C libraries, your suggestion will be really welcome.

The presentation of my PhD defence

That’s it! Last week I defended my PhD thesis!! I have gone through it, and survived to tell!

I don’t feel very different from before, apart from being relieved :-). Now the future is possibly more difficult than before, because I have to look for a job position and finish a lot of things.

While I was preparing the slideshow, I realized that there are not many examples of presentations for a PhD defence online. This is bad, because you need all forms of help to prepare this presentation.The PhD defence is the last thing that you do as a PhD student, so you want to do it perfectly. It is also the moment when you describe many years of your work to the your colleagues and family. Thus, it is bad that there are few examples of slideshows for PhD defence online.

Here is the presentation that I have prepared for my defence. I hope that it will be useful to other people as an example for their defences.

I think that, for this type of presentation, the first slide to make is the “summary of the talk” slide, like the “Topics” slide I have. Usually I don’t like to have such summary slides in my presentation, but for the Thesis defence it is very important, because it gives you a feeling of security when you present. Having a well defined structure allows you to know when you can stop to drink some water or to check if everybody is following, and to know exactly what to say in each slide of the talk.

my poster featured in the “Better Posters” blog!

My ECCB2012 poster has been featured in the Better Posters blog. Check the article here: http://betterposters.blogspot.com.es/2013/10/invitating-interaction.html

I am glad because betterposters is one of my favorite blogs. It’s a blog about designing and improving posters for scientific conferences, and it contain many tips and examples of how scientific posters can be improved.

The poster featured there is the poster of the “Post-its”, which I briefly described in the article of the “best practices“.

DSC00132
Here are some other comments and tips from my experience of using post-its to get feedback during a conference:

Continue reading

my attempt at following every possible Best Practice in Bioinformatics

I have just uploaded my first paper to arXiv. The title is “Human Genome Variation and the concept of Genotype Networks“, and presents a first, preliminary application of the concept of Genotype Networks to human sequencing data. I know that the title may sound a bit pretentious, but we wanted to  pay a tribute to a great article by John Maynard Smith, to which the work presented is inspired.

Nevertheless, in this blog post I am not going to discuss the contents of the paper, but only on how I did this work. This was a project that I did in my last year of my PhD, and I have made an extra effort in trying to follow every best practice rules I knew.

I started my PhD in the pre-bedtools and pre-vcftools era of bioinformatics, and I saw the evolution of this field, from a spare group of people in nodalpoint to the rise of Biostar and Seqanswers. During this time, I have read and followed a lot of discussions about “what is the best way to do bioinformatics”, from whether to use source control, to testing, and much more. For the last project as a PhD student, I wanted to apply all the practices that I had learn, to determine if it was really worth to spend time learning them.

Premise: dates and times of the project

My PhD fellowship supports a three months stay in another laboratory in Europe. I decided to do it in prof. Andreas Wagner’s group in Zurich.

The decision to go to Wagner’s group was motivated by a book that he had recently published, entitled “The Origins of Evolutionary Innovations”. Previous to the start of this project I had read some articles by Andreas Wagner, and found them very interesting, so the opportunity to stay in his lab was very exciting. However, in light of what I learned during this time, I have admit that before December 2011, I didn’t understand most of the concepts present in the book. Thus, we can say that for this project, I started from zero.

I started thinking of this project in December 2011. I did the first practical implementation in the three months of the stay in Zurich, from May to August 2012. The first preliminary results came in January 2013, and the first manuscript in April 2013. We submitted to ArXiv in August 2013. During this period of time, I have also worked on three other projects, wrote my thesis, and taught at the Programming for Evolutionary Biology workshop in Leipzig.

I started working on this project in December 2011, and finished in August 2013. The log only shows the activity of code changes.
I started working on this project in December 2011, and finished in August 2013. This figure only shows the activity of code changes.

 

Note: this blog article is very long, you may want to download as PDF and read it more comfortably.

Continue reading

Thesis deposited!! Here is my preface

I have just deposited my PhD thesis! If everything goes well, I will defend it within a few months. It took me a while, but I am there at last!

I can not post the thesis online yet, but in the meanwhile, I would like to at least post the preface I wrote for it.

My thesis is dedicated to detection and characterization of signatures of selection in the human genome. Thus, the preface is about the ethical problems faced in human population genetics, and narrates a little story about a mistake made by an earlier anthropologist. I hope that you will enjoy it. As for me, I am going to make a short break to celebrate.

 

Preface

Toward the end of the 18th century, the German anthropologist Johan Friedrich Blumenbach wrote a book on the origins of mankind, with the aim of demonstrating that all humans belong to the same species. The society of the 18th century was much more segregated than our modern society, and some people, including renowned scientists, believed that blacks and American Indians did not belong to the same species as the white man. Blumenbach, who was a strong opponent of racist theories, decided to write a book to demonstrate that all human have a common origin, and that there are no scientific basis for any discrimination.

Eventually, Blumenbach succeeded in his noble intentions, but a little design mistake in his book led to a misinterpretation that he would not have desired. In his book, Blumenbach listed the human populations in the following order: American, Mongolian, Caucasian, Malaysian and African. Since he believed that the human species originated in the Caucasian region, he explicitly put the Caucasian population in the middle, as a way to remind his readers that all human beings have a common origin. Unfortunately, this tiny detail was interpreted as a prove that the Caucasian was the purest of all human races. People believed that if even him, the most egalitarian scientist of the time, positioned white people at the center of the Geometry of races, it was because these had a special importance.

This error is representative of how delicate is to work in the field of Human Population Genetics. If Blumenbach had decided to list the populations in a different order, for example, by placing Caucasians in the second position, events like the Jim Crow’s laws in the United States and even the Nuremberg laws in Germany would not have had the same scientific justification they had. A whole life spent to demonstrate that all people are equal has gone forgotten because of a bad decision in listing the names of some populations. Blumenbach was a strong champion of equality, but his mistake affected the life of innocent people.

This thesis is dedicated to Johan Friedrich Blumenbach, with the hope that learning from his mistake will protect me from making similar errors. The work presented here describes new methods to analyze human population genetics data, and specifically, to detect genes and alleles that have given a selective advantage to a human population. Nevertheless, these “selective advantages” are only relative to events to which our ancestors have been exposed in the past. The only reason why we study them is to understand how our genome works, with the aim of designing better medicines and improve our health conditions.

The field of Human Population Genetics is in a delicate position at this moment. We live in times of cheap genome sequencing, and we can expect that, in the close future, genome sequencing will become a component of our daily lives. Moreover, the appearance of new communication media has made science more accessible to everybody – with good and bad implications. This means that the research that is being written right now by population geneticists will soon be read by not only by scientists, but also by people moved by other interests. It is difficult to predict how our work will be interpreted, as it was difficult, in the 18th century, to predict how a mistake in listing populations would have had such negative impact.

I hope that those who will read this thesis will do it with a positive mind. I have tried with all my efforts to avoid any concept that may be misinterpreted, but my lack of experience may have not allowed me to find all the potential flaws. I hope that the people who will read this thesis will be savvy when they encounter mistakes, and that they will be stimulated to learn more about this subject. Eventually, they will discover that despite the errors that scientists can make, Blumenbach was right in his intentions: all humans beings belong to the same species, and there are no scientific basis for any form of discrimination.

 

(This preface is inspired by the chapter “The Geometer of Race” in the book “I have Landed” by S.J.Gould, 2003, and by the book “Fatal Invention” by Dorothy Roberts, 2011)

Two short “Agile Bioinformatics” talks

I have just come back from the Programming for Evolutionary Biology course in Leipzig, version 2013!! The course is still going on, but unfortunately this year I could not stay the whole duration three weeks, as I have stuff to do here in Barcelona.

This year, apart from the “Introduction to Linux” module, I also taught a short module on “Best Practices for programming in bioinformatics”. It was pure fun, I think I never enjoyed so much giving a talk. I explained a part about Version Control, and another about Scrum, and people were really excited about it. To make you understand how much people liked this talk, consider that three persons invited me a beer after that, which for me constitutes the maximum compliment for a talk.

I have uploaded the two slideshow on slideshare. Unfortunately, the best part of the talk was a live demonstration on how I use these practices during my daily work, but at the moment I can not make these example publicly available. However, you should be able to follow the slideshows anyway.

 

Notes from a “Write it clearly” course

I recently took a course on improving English Writing skills for researchers. These are my notes, organized as a series of “Do and Do not” lists, plus some separate list for each section of a research paper.

Feel free to have a look at them and make use of them. If you have any comments, you can add them here or to table. Have an happy paper writing day!

click to access the notes.

I wrote a videogame for the Wii

I wrote a small web game for the “Week of Science 2012” (Semana de la Ciencia), a science divulgation initiative organized in Spain. I participated to it as a member in the Institut of Biologia Evolutiva of Barcelona, the institution to which I belong to. The game is in Spanish, but I think anybody can understand it without translation. Click on the image to play with it:

The “Phylogenetic Tree” for the “Semana de la Ciencia” in Barcelona.

If the game is not shown correctly, click on this link: IBE phylogenetic game sc2012

In short, we had 15 minutes to explain to a class of college students (from 12 to 18 years old) how to make phylogenetic trees. This is how we organized the time:

  • In the first five minutes, we had a short presentation explaining that we all come from a common ancestor, and that our work of evolutionary biologist is to reconstruct the tree of life. We also explained what a phylogenetic tree is, and how we reconstruct it.
  • In the next three minutes, we played the first game. This game was quite easy, and was meant to check if the student understood how phylogenetic trees are constructed. During this first game, one volunteer student had to decide where to put a mammal, a bird and a jellyfish in a phylogenetic tree.
  • In the next minutes, we played the second game, which was a bit of a trick. Students had to reconstruct the phylogenetic tree of four protists. Have a look at the “Juego 2: protists” to see it. This game was tricky because there it is no way to come with the correct solution. In fact, after letting the students play for a while, we showed them that the only way to know the real phylogenetic tree was to use the DNA sequences. Then we had a few more slides explaining how mutations in DNA sequences can be used to reconstruct the history of changes in evolution.

To make things a bit more entertaining, we also connected a Wii remote to the computer, so the student who played the game had to use it as a mouse. This was fun to set up, and I think I will use a Wii remote in my next talk :-).

The activity was a bit condensed in 15 minutes, but I think that more or less all the students understood the basic concept. At least, some made questions, and in general, they seemed to like the game. I hope they will at least remember that DNA can be used to study how species have evolved :-).

If you want to customize the page, the code is available on bitbucket:

This was the first time I programmed something in Javascript, so the code is a terrible mess. There is a lot of code duplication, and a lot of patchy fixes. But as Agile Programmers say, “Code first and Refactore later”. I think I will work on cleaning this code for next year, so if you have any suggestions on how to make it better, please join the repository on bitbucket.