Are fitness genes more conserved across species? my 30-minutes attempt

A recently published paper by Hart et al presented a genome-wide CRISPR screening to identify fitness genes (a superset of essential genes) in five cell lines. The paper is quite impressive and shows the potentiality of CRISPR to generate large scale knockouts and to characterize the importance and function of genes in different conditions.

In the discussion the authors propose that fitness genes are more likely to be more conserved across species. However they do not follow-up on this hypothesis, probably for lack of space. They can’t be blamed as they already present a lot of results in the paper.

Distribution of conservation scores in the phastcons.100way.UCSC.hg19 track. Are essential genes more conserved than other genes?
Distribution of conservation scores in the human genome. Are essential genes more conserved than other genes?

This post presents a follow-up analysis on the hypothesis that fitness genes are more conserved than non-essential genes. I’ll take the original data from the paper, get the conservation scores from bioconductor data packages, and do a Wilcoxon test to compare the two distribution. The full code is available as a github repository, and please feel free to contribute if you want to do some free R/Bioconductor analysis.

Continue reading

Reviewed “Bioinformatics with Python cookbook” by Tiago Antao

I’ve recently been a reviewer for the book “Bioinformatics with Python cookbook” by Tiago Antao, one of the big authors of BioPython. The book is published by Packt Publishing, and it is a collection of recipes for several bioinformatics tasks, from reading large genome files to doing population genetics and other tasks.

 

python book
Bioinformatics with Python Cookbook on my desktop, together with my zombie mug.

 

The github account of the author contains a link of all the python notebooks illustrated in the book. These notebook are freely accessible, but there is no explanation of the code, as for that you will need to buy the book. Moreover, the book provides a link to a docker image that can be used to install all the materials and software needed to execute the examples. I think this is a smart way to provide materials for exercises, and I will copy the idea in the future.

Being a reviewer, I was expected to be an expert in all the topics described in the book. However I must admit that I learned a lot from reviewing it, and that some of the recipes presented managed to surprise me. Here is a quick summary of the new things I learned:

  • How to convert many bioinformatics-related formats with pygenomics and biopython
  • How to use the rest APIs for querying ensembl
  • How to do and plot a PCA in python and eigensoft of SNP data
  • SimuPOP is a nice software for simulating population genetics events
  • DendroPy is a nice module for dealing with phyologenetic trees, like ete
  • PDB files are going to be replaced by mmCIF files, and BioPython is able to read both formats
  • pymol and cytoscape can be commanded from within a python script/ipython
  • PSIQUIC is a consistent interface to many molecular-interaction databases
  • ipython has excellent multi-core execution capacity.
  • it is easy to optimize python code with cython and numba, just by adding a few decorators

If you buy the book and find any error in the code, you can blame me as I was a reviewer and didn’t find it.

BioStar, the StackOverflow for bioinformaticians

If you are a programmer you may already know StackOverflow, which is a forum-like website dedicated to questions about computer science with a innovative design and a very active community, and probably the best place on Internet where to ask when you have doubts related to programming.

Thanks to a recent post on the biopython-dev mailingl list, I discovered that it is possible to create websites with the same engine used by StackOverflow, and personalize them on a specific topic: for example, this blog and this site list all the StackExchange-like websites, ranging from mathematics to electronics, to business.

So, there is also a StackExchange website dedicated to bioinformatics, and lately I have been using it: its name is BioStar:

biostar. Click on the link to access.

If you have a question on bioinformatics, whether specific or general, you may ask it there… If we can create a community similar to the StackOverflow for bioinformatics, it will be a very good resource for everyone.

My first GeneOntology term!!

Today the maintainers of the GeneOntology database have accepted a term that I had proposed two days earlier. Therefore, since today I am the daddy of ‘integral to lumenal side of endoplasmic reticulum membrane‘ (GO:0071556)!!! 🙂

logo of the GeneOntology project
logo of the GeneOntology project

GeneOntology is a database of terms used to describe the functions and the properties of proteins and objects of biological interest. It is like an ufficial and structured vocabolary, which you can use when you describe a protein to be sure that there it will be no misunderstanding on what you are saying.

Proposing a term to GeneOntology is very easy and quick, and the maintainers answer quite quickly. The only thing you have to do is to go to their bug tracker on sourceforge and provide the few informations that they require (see the instructions).  As I said before, it tooks only 2 days for them to accept my proposal, but my term was kind of a special case and it was clear that it was necessary to add it.

While the process to add GO term is quick and efficient, I think that they should improve their annotations for genes a bit. In fact there are a lot of GO terms which seems to be not associated with any gene, for example this one (which is some way similar to the gene that I have just proposed).

Moreover, sometimes it is difficult to navigate the GeneOntology tree itself: if you look again at GO:0071458, and look at the tree in the lower part of the page, you see that the term is repeated several times (because it belongs to several higher-level terms), and the representation is correct but a bit intimidating at first.

So, hurrá for GeneOntology… if in the future you will ever work with something which is ‘integral to lumenal side of endoplasmic reticulum membrane’, be it either a gene or another molecule, I hope it will remind you of me… 🙂

p.s. this is the full story about my term.