my attempt at following every possible Best Practice in Bioinformatics

I have just uploaded my first paper to arXiv. The title is “Human Genome Variation and the concept of Genotype Networks“, and presents a first, preliminary application of the concept of Genotype Networks to human sequencing data. I know that the title may sound a bit pretentious, but we wanted to  pay a tribute to a great article by John Maynard Smith, to which the work presented is inspired.

Nevertheless, in this blog post I am not going to discuss the contents of the paper, but only on how I did this work. This was a project that I did in my last year of my PhD, and I have made an extra effort in trying to follow every best practice rules I knew.

I started my PhD in the pre-bedtools and pre-vcftools era of bioinformatics, and I saw the evolution of this field, from a spare group of people in nodalpoint to the rise of Biostar and Seqanswers. During this time, I have read and followed a lot of discussions about “what is the best way to do bioinformatics”, from whether to use source control, to testing, and much more. For the last project as a PhD student, I wanted to apply all the practices that I had learn, to determine if it was really worth to spend time learning them.

Premise: dates and times of the project

My PhD fellowship supports a three months stay in another laboratory in Europe. I decided to do it in prof. Andreas Wagner’s group in Zurich.

The decision to go to Wagner’s group was motivated by a book that he had recently published, entitled “The Origins of Evolutionary Innovations”. Previous to the start of this project I had read some articles by Andreas Wagner, and found them very interesting, so the opportunity to stay in his lab was very exciting. However, in light of what I learned during this time, I have admit that before December 2011, I didn’t understand most of the concepts present in the book. Thus, we can say that for this project, I started from zero.

I started thinking of this project in December 2011. I did the first practical implementation in the three months of the stay in Zurich, from May to August 2012. The first preliminary results came in January 2013, and the first manuscript in April 2013. We submitted to ArXiv in August 2013. During this period of time, I have also worked on three other projects, wrote my thesis, and taught at the Programming for Evolutionary Biology workshop in Leipzig.

I started working on this project in December 2011, and finished in August 2013. The log only shows the activity of code changes.
I started working on this project in December 2011, and finished in August 2013. This figure only shows the activity of code changes.

 

Note: this blog article is very long, you may want to download as PDF and read it more comfortably.

Continue reading

Thesis deposited!! Here is my preface

I have just deposited my PhD thesis! If everything goes well, I will defend it within a few months. It took me a while, but I am there at last!

I can not post the thesis online yet, but in the meanwhile, I would like to at least post the preface I wrote for it.

My thesis is dedicated to detection and characterization of signatures of selection in the human genome. Thus, the preface is about the ethical problems faced in human population genetics, and narrates a little story about a mistake made by an earlier anthropologist. I hope that you will enjoy it. As for me, I am going to make a short break to celebrate.

 

Preface

Toward the end of the 18th century, the German anthropologist Johan Friedrich Blumenbach wrote a book on the origins of mankind, with the aim of demonstrating that all humans belong to the same species. The society of the 18th century was much more segregated than our modern society, and some people, including renowned scientists, believed that blacks and American Indians did not belong to the same species as the white man. Blumenbach, who was a strong opponent of racist theories, decided to write a book to demonstrate that all human have a common origin, and that there are no scientific basis for any discrimination.

Eventually, Blumenbach succeeded in his noble intentions, but a little design mistake in his book led to a misinterpretation that he would not have desired. In his book, Blumenbach listed the human populations in the following order: American, Mongolian, Caucasian, Malaysian and African. Since he believed that the human species originated in the Caucasian region, he explicitly put the Caucasian population in the middle, as a way to remind his readers that all human beings have a common origin. Unfortunately, this tiny detail was interpreted as a prove that the Caucasian was the purest of all human races. People believed that if even him, the most egalitarian scientist of the time, positioned white people at the center of the Geometry of races, it was because these had a special importance.

This error is representative of how delicate is to work in the field of Human Population Genetics. If Blumenbach had decided to list the populations in a different order, for example, by placing Caucasians in the second position, events like the Jim Crow’s laws in the United States and even the Nuremberg laws in Germany would not have had the same scientific justification they had. A whole life spent to demonstrate that all people are equal has gone forgotten because of a bad decision in listing the names of some populations. Blumenbach was a strong champion of equality, but his mistake affected the life of innocent people.

This thesis is dedicated to Johan Friedrich Blumenbach, with the hope that learning from his mistake will protect me from making similar errors. The work presented here describes new methods to analyze human population genetics data, and specifically, to detect genes and alleles that have given a selective advantage to a human population. Nevertheless, these “selective advantages” are only relative to events to which our ancestors have been exposed in the past. The only reason why we study them is to understand how our genome works, with the aim of designing better medicines and improve our health conditions.

The field of Human Population Genetics is in a delicate position at this moment. We live in times of cheap genome sequencing, and we can expect that, in the close future, genome sequencing will become a component of our daily lives. Moreover, the appearance of new communication media has made science more accessible to everybody – with good and bad implications. This means that the research that is being written right now by population geneticists will soon be read by not only by scientists, but also by people moved by other interests. It is difficult to predict how our work will be interpreted, as it was difficult, in the 18th century, to predict how a mistake in listing populations would have had such negative impact.

I hope that those who will read this thesis will do it with a positive mind. I have tried with all my efforts to avoid any concept that may be misinterpreted, but my lack of experience may have not allowed me to find all the potential flaws. I hope that the people who will read this thesis will be savvy when they encounter mistakes, and that they will be stimulated to learn more about this subject. Eventually, they will discover that despite the errors that scientists can make, Blumenbach was right in his intentions: all humans beings belong to the same species, and there are no scientific basis for any form of discrimination.

 

(This preface is inspired by the chapter “The Geometer of Race” in the book “I have Landed” by S.J.Gould, 2003, and by the book “Fatal Invention” by Dorothy Roberts, 2011)

I wrote a videogame for the Wii

I wrote a small web game for the “Week of Science 2012” (Semana de la Ciencia), a science divulgation initiative organized in Spain. I participated to it as a member in the Institut of Biologia Evolutiva of Barcelona, the institution to which I belong to. The game is in Spanish, but I think anybody can understand it without translation. Click on the image to play with it:

The “Phylogenetic Tree” for the “Semana de la Ciencia” in Barcelona.

If the game is not shown correctly, click on this link: IBE phylogenetic game sc2012

In short, we had 15 minutes to explain to a class of college students (from 12 to 18 years old) how to make phylogenetic trees. This is how we organized the time:

  • In the first five minutes, we had a short presentation explaining that we all come from a common ancestor, and that our work of evolutionary biologist is to reconstruct the tree of life. We also explained what a phylogenetic tree is, and how we reconstruct it.
  • In the next three minutes, we played the first game. This game was quite easy, and was meant to check if the student understood how phylogenetic trees are constructed. During this first game, one volunteer student had to decide where to put a mammal, a bird and a jellyfish in a phylogenetic tree.
  • In the next minutes, we played the second game, which was a bit of a trick. Students had to reconstruct the phylogenetic tree of four protists. Have a look at the “Juego 2: protists” to see it. This game was tricky because there it is no way to come with the correct solution. In fact, after letting the students play for a while, we showed them that the only way to know the real phylogenetic tree was to use the DNA sequences. Then we had a few more slides explaining how mutations in DNA sequences can be used to reconstruct the history of changes in evolution.

To make things a bit more entertaining, we also connected a Wii remote to the computer, so the student who played the game had to use it as a mouse. This was fun to set up, and I think I will use a Wii remote in my next talk :-).

The activity was a bit condensed in 15 minutes, but I think that more or less all the students understood the basic concept. At least, some made questions, and in general, they seemed to like the game. I hope they will at least remember that DNA can be used to study how species have evolved :-).

If you want to customize the page, the code is available on bitbucket:

This was the first time I programmed something in Javascript, so the code is a terrible mess. There is a lot of code duplication, and a lot of patchy fixes. But as Agile Programmers say, “Code first and Refactore later”. I think I will work on cleaning this code for next year, so if you have any suggestions on how to make it better, please join the repository on bitbucket.

New ways to explore your academic impact

It seems that today, for a strange series of coincidences, is a good day if you wanted new tools to explore your academic impact.

First, Google/Scholar Citations has finally been opened to all. Everybody can now create a profile on Google/Scholar, to keep track of articles and citations. I like google/scholar because it finds articles and books that are not indexed on scopus, but that are interesting nevertheless. Plus, it is free to use. However, our paper on Recombination Rates has been recently cited in a Nature Genetics paper, and Google/Scholar didn’t find it out.

Second, the finalists for the PLoS/Mendeley binary battle have been selected. Check the list here. The PLoS/Mendeley binary battle is an initiative proposed by these two organizations to encourage the writing of applications that make use the PLoS and the Mendeley APIs, to retrieve information on papers and readers. So, this initiative is originating some very good web applications to explore academic impact or play with citations and papers, and here I will describe some of my favourites.

I like two tools to see the impact of research articles on Internet: Total Impact and Readermeter. They both allow to see how many times your articles are read on Mendeley, cited, referenced on Twitter and Facebook, bookmarked on CiteULike, and much more.  The nice thing about Total Impact is that it also indexes my presentations on slideshare: for example, one of my presentations on Python is actually more popular than any other paper. However, one of our papers is not being recognized correctly, because of a duplicated entry in mendeley. On the other hand, Readermeter allows to see the geographical distribution of readers, and provides more statistics. It would be good if it would be possible to embed one of these two reports in a web page, for example in the About page of a blog, or an academic home page.

My TotalImpact report. Click on it to see the full report. Check also my ReaderMeter report if you like.

Another tool I liked is PaperCritic. It is a repository of commentaries on published papers. The idea is not entirely new: PLoS and other journals already allow to comment on papers. Unfortunately not all publishing houses provide this option.. moreover, having a central repository of comments on papers makes them easier to browse and select. I only wonder how much this tool is redundant with ResearchBlogging, and if the commentaries posted on the site are communicated to the authors of the paper even if they are not signed on PaperCritic.

So, these tools provides new ways to play with academic impact indicators, and to see whether our work is effectively useful to anyone.. I’ve played with them this morning, but now I would be better to get back to work, to improve their results 🙂

 

Recruiting mentors for MindTorch

I have a small announcement to make. In the last months I became involved in MindTorch, a London based start-up that aims at matching students and young researchers with potential career mentors. It is important for young people to have a mentor or a person of reference to advise them, telling them how to invest in their future and which mistakes to avoid. MindTorch aims at helping people finding exactly that, by providing a community where everyone can find a mentor.

Screen Shot 2015-07-20 at 17.35.32

We are currently in the phase of recruiting mentors – that is, professional or researchers with good experience and who would be willing to dedicate some time to help and counsel younger people. Every mentor is supposed to dedicate one hour every month to their mentee, for three to six months and starting from next October/November, plus some initial time to communicate with us. So, if you would like to volunteer as a mentor for MindTorch, contact me or register as a mentor on the website.

In summary:

Do I have enough experience to mentor someone? If you have a degree and job experience you can certainly be a good mentor. We will train you and support any doubts you may have.

How much time and effort will it take? You will have to dedicate one hour every month to counsel a younger student, for three to six months.

What do I get from being a mentor? For the moment we are not planning on giving any monetary retribution to mentors, but you will get to learn a lot from the experience and have the opportunity to grow your network of contacts. We will also help you finding mentors and new contacts for your own career.

How to join: register as a mentor on the website.