The true story behind the annotation of a pathway

These slides are from a talk I gave earlier this week to my lab, describing two papers we published recently:

(slides are published on Nature Precedings: you can vote it here)

Bioinformaticians frequently use data and annotations from scientific databases, like KEGG or Uniprot. However, it is difficult to know how much accurate this data is, and to which extent it can be used for a large scale analysis.

So, the talk is about this. Let’s say you dedicate 6 months of my PhD thesis to accurately study and annotate a set of genes, like I did for the N-Glycosylation pathway: How many errors or unclear annotations do you expect to find in scientific databases?

Another topic discussed in the talk is the issue of how to report an error to a database. Many databases do not have a transparent system to report errors, so any incongruence is correct behind the scene, generating some issues to reproducibility. Moreover, the process of reporting errors to a database is basically not acknowledged by the scientific community, and this is unfortunate because if it were more recognized we could have better annotations in the databases and a more active scientific community.

References:

Dall’Olio GM, Bertranpetit J, & Laayouni H (2010). The annotation and the usage of scientific databases could be improved with public issue tracker software. Database : the journal of biological databases and curation, 2010 PMID: 21186182
Dall’olio GM, Jassal B, Montanucci L, Gagneux P, Bertranpetit J, & Laayouni H (2011). The annotation of the Asparagine N-linked Glycosylation pathway in the Reactome Database. Glycobiology PMID: 21199820

4 Comments

Nice slides! Using those DB is not really part of my day-by-day job, but it is good to know that potential issues exist, especially for those who only use them infrequently (and therefore are probably just assuming they are correct).

gioby says:

January 19, 2011 at 10:21

Thank you Nico!! In fact I enjoyed the work I did, looking carefully at the annotations of a single pathway in different databases and trying to find any minor unclear point. I would recommend to any student wishing to do bioinformatics to do something like this.

Reply

Hey Gio, thanks I enjoyed your presentation a lot! I really understand the annotation issue, we are on the other side though… in our lab we are annotating ourselves now. From scratch, public data, with own ontologies. It’s a nightmare but hopefully it will pay back… and in a way I enjoy what I’m learning :D

gioby says:

January 19, 2011 at 19:09

Thank you Alba!! I am planning to present this talk as a technical seminar, probably on April or later. See you soon!

Reply

(slides are published on Nature Precedings: you can vote it here)

Dall’Olio GM, Bertranpetit J, & Laayouni H (2010). The annotation and the usage of scientific databases could be improved with public issue tracker software. Database : the journal of biological databases and curation, 2010 PMID: 21186182

Dall’olio GM, Jassal B, Montanucci L, Gagneux P, Bertranpetit J, & Laayouni H (2011). The annotation of the Asparagine N-linked Glycosylation pathway in the Reactome Database. Glycobiology PMID: 21199820

4 Comments

Leave a Reply to nico Cancel reply