open position at BioDec, bioinformatics company in Bologna (Italy)

There is an open position at BioDec, a bioinformatics company based in Bologna, Italy:

people at BioDec are among the authors of Ensemble, a tool to predict transmembrane portions of protein helices. If you have ever clicked on the link ‘Third Party data’ in Uniprot (example), the predictions on transmembrane helices are provided by them. They are also involved in the implementation of other tools developed in Rita Casadio’s lab in Bologna and other bioinformatics laboratories in Italy.

Moreover, BioDec is the company that produces Plone4Bio, a library for Plone for Bioinformatics. Plone is a framework to make websites with Python, so if you are a web programmer interested in the field of bioinformatics, this may be a good experience for you.

Phylo – the multiple alignment game

Phylo is a game where players are required to manually edit a multiple alignment. The player who can make the best multiple alignment, maximizing the matches and reducing gaps, gets the best score.

This is a very funny and innovative idea. It is based on the principle that humans are better at identifying patterns than computers, and that the problem of calculating a multiple alignment is so complex that even the most advanced multiple alignment software does not find the best solution for a large set of sequences, and that a manual editing of a multiple alignment is always required.

You may have already heard about fold.it, a similar game based on protein folding: this is the answer for the guys who work in the field of multiple alignments and phylogeny. I am happy because I belong to this second group :-).

(source: http://oggiscienza.wordpress.com/2010/12/09/farmville-fatti-da-parte)

update on the collaborative article on Post-GWAS functional characterization

Here it is a resume on the new paragraphs/addition that have been made to the collaborative WikiGene paper in the past two weeks.

For those who have not been following: the manuscript is a perspective on approaches and good practices to study the function of a variant identified in a GWAS.
It happens too often that, after a GWAS is successful in identifying a relationship between a tag SNP and a disease, these results are not followed by a study on the biological mechanism behind the association, or by studies on the exact location of the causal variant.

Resume of changes (sorry if I forgot anything):

  • added references to the Uk10K project, and improved the description of 1000genomes
  • created a chapter on computational methods to predict the function of a variant. We described: the databases that annotate information on SNPs or other association studies, tools like GRAIL to analyze the literature, cited the utility of genome browser like UCSC’s, cited a study where the authors have described a pipeline to predict the effect of a non-synonymous SNP on the structure of a protein (the author of the paper have been contacted and will contribute to the paper) and we will describe how to predict pseudogenes or functional elements, and a bit about pathway approaches.
  • described how alternative splicing can add complexity to eQTL association studies
  • described the complexity of using RNA-Seq and microarrays (also in table 1), plus a few details on Zinc-Finger technologies
  • described that it is important to take into account the interactions between chromatine fibres when studying the effect of a SNP. Different genotypes can be associated with a different chromatine network, which adds a whole level of complexity when predicting the effect of a SNP on the phenotype.
  • differences between studying SNPs and CNVs
  • discussed the usage of BRCA1 cancers as models to validate GWAS
  • added some motivations on why animal models are not perfect to reproduce the effect of a SNP in human.

There are still 11 days left to make additions, so if you can make other contributions you will be welcome.

new paper from my lab: IRiS

ResearchBlogging.orgThe latest paper published by people in my lab describes a method to reconstruct past Recombination Events:

  • MelĂ© M, Javed A, Pybus M, Calafell F, Parida L, Bertranpetit J, & The Genographic Consortium (2010). A New Method to Reconstruct Recombination Events at a Genomic Scale. PLoS computational biology, 6 (11) PMID: 21124860.

Let’s say that you have a set of genotypes obtained from a human population, like the HapMap project, the HGDP samples or a custom dataset: with this algorithm you can predict some of the recombination events that occurred in recent times.

While the most common approaches to analyze genotype panels datasets focus on identifying footprints of positive selection, association of a SNP with a disease, etc.. there have been few efforts to look at the history of recombination events. However, an event of recombination can have the same importance as any other mutation event. By knowing when a recombination has occurred we can infer useful information on the function and the history of the region involved.

What I like very much about this article is the impressive work of validation they have done to demonstrate the validity of their software. I am a silly person when it comes to the matter that bioinformatics software should be tested properly: but this time they have probably spent more time in testing the algorithm than in developing it.

The first approach they have used has been to carry out a lot of simulations, using the software CoSi. They have simulated the ‘whole history of the human species’ thousands of times, and then applied the algorithm on the simulated data to see how it performed. Afterwards, they have also used data from a sperm typing panel, which is a good dataset to study recombination events in human.

Predicting recent events of recombination from genotype data. The authors did a tremendous work to demonstrate the validity of their software.

So, if you want to know about a good paper with good examples on how to test a computational tool, you can have a look at this paper.

What is PLoS – Currents?

I was looking at the PLoS home page and I saw this message about a new PLoS Currents section about phylogeny: PLoS Currents: new section on phylogenetic analyses

What is PLoS-Currents? This is the first time I read about it. Does anyone have experience with it? Can anybody explain me what is it exactly?

Albert Istvan from Biostar explained me that it is based on Google/Knol, a sort of wikipedia but with restriction on editing. It is a place where you can submit documents in the style of reports (or scientific papers), and they will be published only if they pass a revision process. It is a way to publish results quickly, which is useful for fields which need quick updates, like everything related to the Influenza virus or to the new sequencing techniques.

So far, in PLoS they have four Currents Collections, and one of them is particularly interesting: the one called PLoS/Currents Evindence on Genomic Test. It is a place where to publish document on the genetic tests and similar. For example, if you have bought a 23AndMe kit or one from the other competitors, this will be a good place where to look information about each of the tests. Maybe it is also a good resource to be included in the WikiGenes/Nature Genetics paper I was writing about earlier, but I have to look better at it.

Links of the past month

Nice blog posts

Bash-tricks, programming and Linux tools:

  • a plugin for vim to R – I have been looking for this for ages. Let’s see if it is good enough as the Emacs plugin for R.
  • jump, a command line tool to bookmark directories in a Unix command line and jump to them quickly, and go-tool, one alternative.
  • Synapse, the new incarnation of Gnome-do

notes on the collaborative manuscript on GWAS risk studies

Here are a few notes about contributing to the Nature Genetics manuscript that I was talking about in a previous post.

note2: I have opened a discussion on Biostar, if you are interested in contributing, look there also.

Scope and purpose of the article

The main purpose of the article is to explain how the results from a GWAS study can be functionally validated. Let’s say that a study has identified a SNP variant that is likely to be associated with a trait: the collaborative article describes the methods that can be used to demonstrate the association, by identifying eQTLs and study them through microarrays to building animal models to simulate the effect of the variant.

In my opinion the key to understand what the manuscript is about, and why it is being written collaboratively, lays in this recent Nature Genetics editorial:

the authors of the Editorial say that most of the times, after a GWAS study has found association between a SNP variant and a risk for a trait, the result is not followed by a functional characterization of the SNP.

(edit) Moreover, you should also look at the home page of the group that has written the original draft, which is the Post Genome Wide Association Study Initiative.

It seems that the same Nature Genetics authors are sponsoring this article as a way to promote discussion about the future after GWAS studies. Instead of proposing to some selected authors to write a review on the topic, they are calling for help from all the scientists interested on Internet. I think it is a nice idea to promote discussion.

Ideas for new paragraphsContinue reading

23AndMe kits for 99$ (159$) for the weekend

23AndMe is offering their kits for 99$ instead of 499$ for the weekend. Actually, if you read the conditions carefully the real prize is 159$, because you have to sign to their ‘Personal Genomics Service’ for at least one year.

If you are really interested in buying a 23andMe kit, you should know that every now and then they make such offers. The last time they did it was on 2010 April 23th, in honor of the DNA day.

As you may already have read from somewhere else about 23andMe, they don’t sequence the whole genome but only a small subset of snps, and then they give you the association between your genotypes and some congenital disease. I don’t actually believe that any of the information they give is useful to learn about your health, because no snp association study has been able to explain more the 5% of the genetic variability. I recommend you to buy the kit only if you work in this field and you are interested in the topic, as 159$ is a good offer for it.

source of the news: reddit/bioinformatics

WikiGenes: looking for authors for a Nature Genetics paper

WikiGenes is looking for contributors to a Nature Genetics paper on Genome Wide Association Studies. If you don’t mind being an author on Nature Genetics, have a look at this mail I have received:

Dear Giovanni Marco,

The editor of Nature Genetics has commissioned a collaborative standards paper on Genome Wide Association Studies. An editable draft of this paper is now online at WikiGenes, http://www.wikigenes.org/GWAS.html?wpc=12

I hope this is an interesting opportunity for you, because significant contributions to this draft might get you a co-authorship on the final paper in Nature Genetics.

I would also like to use this occasion to ask you a favor.

If you like WikiGenes, please tell your friends about it. We do not have the budget of big publishers, so we depend fully on word-of-mouth publicity.

Or you could also help us by linking to WikiGenes from your website. Thank you!