A recently published paper by Hart et al presented a genome-wide CRISPR screening to identify fitness genes (a superset of essential genes) in five cell lines. The paper is quite impressive and shows the potentiality of CRISPR to generate large scale knockouts and to characterize the importance and function of genes in different conditions.
In the discussion the authors propose that fitness genes are more likely to be more conserved across species. However they do not follow-up on this hypothesis, probably for lack of space. They can’t be blamed as they already present a lot of results in the paper.
This post presents a follow-up analysis on the hypothesis that fitness genes are more conserved than non-essential genes. I’ll take the original data from the paper, get the conservation scores from bioconductor data packages, and do a Wilcoxon test to compare the two distribution. The full code is available as a github repository, and please feel free to contribute if you want to do some free R/Bioconductor analysis.
I’ve recently been a reviewer for the book “Bioinformatics with Python cookbook” by Tiago Antao, one of the big authors of BioPython. The book is published by Packt Publishing, and it is a collection of recipes for several bioinformatics tasks, from reading large genome files to doing population genetics and other tasks.
The github account of the author contains a link of all the python notebooks illustrated in the book. These notebook are freely accessible, but there is no explanation of the code, as for that you will need to buy the book. Moreover, the book provides a link to a docker image that can be used to install all the materials and software needed to execute the examples. I think this is a smart way to provide materials for exercises, and I will copy the idea in the future.
Being a reviewer, I was expected to be an expert in all the topics described in the book. However I must admit that I learned a lot from reviewing it, and that some of the recipes presented managed to surprise me. Here is a quick summary of the new things I learned:
If you are a programmer you may already know StackOverflow, which is a forum-like website dedicated to questions about computer science with a innovative design and a very active community, and probably the best place on Internet where to ask when you have doubts related to programming.
Thanks to a recent post on the biopython-dev mailingl list, I discovered that it is possible to create websites with the same engine used by StackOverflow, and personalize them on a specific topic: for example, this blog and this site list all the StackExchange-like websites, ranging from mathematics to electronics, to business.
So, there is also a StackExchange website dedicated to bioinformatics, and lately I have been using it: its name is BioStar:
If you have a question on bioinformatics, whether specific or general, you may ask it there… If we can create a community similar to the StackOverflow for bioinformatics, it will be a very good resource for everyone.
Today the maintainers of the GeneOntology database have accepted a term that I had proposed two days earlier. Therefore, since today I am the daddy of ‘integral to lumenal side of endoplasmic reticulum membrane‘ (GO:0071556)!!! 🙂
GeneOntology is a database of terms used to describe the functions and the properties of proteins and objects of biological interest. It is like an ufficial and structured vocabolary, which you can use when you describe a protein to be sure that there it will be no misunderstanding on what you are saying.
Proposing a term to GeneOntology is very easy and quick, and the maintainers answer quite quickly. The only thing you have to do is to go to their bug tracker on sourceforge and provide the few informations that they require (see the instructions). As I said before, it tooks only 2 days for them to accept my proposal, but my term was kind of a special case and it was clear that it was necessary to add it.
While the process to add GO term is quick and efficient, I think that they should improve their annotations for genes a bit. In fact there are a lot of GO terms which seems to be not associated with any gene, for example this one (which is some way similar to the gene that I have just proposed).
Moreover, sometimes it is difficult to navigate the GeneOntology tree itself: if you look again at GO:0071458, and look at the tree in the lower part of the page, you see that the term is repeated several times (because it belongs to several higher-level terms), and the representation is correct but a bit intimidating at first.
So, hurrá for GeneOntology… if in the future you will ever work with something which is ‘integral to lumenal side of endoplasmic reticulum membrane’, be it either a gene or another molecule, I hope it will remind you of me… 🙂