The true story behind the annotation of a pathway

These slides are from a talk I gave earlier this week to my lab, describing two papers we published recently:

(slides are published on Nature Precedings: you can vote it here)

Bioinformaticians frequently use data and annotations from scientific databases, like KEGG or Uniprot. However, it is difficult to know how much accurate this data is, and to which extent it can be used for a large scale analysis.

So, the talk is about this. Let’s say you dedicate 6 months of my PhD thesis to accurately study and annotate a set of genes, like I did for the N-Glycosylation pathway: How many errors or unclear annotations do you expect to find in scientific databases?

Another topic discussed in the talk is the issue of how to report an error to a database. Many databases do not have a transparent system to report errors, so any incongruence is correct behind the scene, generating some issues to reproducibility. Moreover, the process of reporting errors to a database is basically not acknowledged by the scientific community, and this is unfortunate because if it were more recognized we could have better annotations in the databases and a more active scientific  community.

References:

  • Dall’Olio GM, Bertranpetit J, & Laayouni H (2010). The annotation and the usage of scientific databases could be improved with public issue tracker software. Database : the journal of biological databases and curation, 2010 PMID: 21186182
  • Dall’olio GM, Jassal B, Montanucci L, Gagneux P, Bertranpetit J, & Laayouni H (2011). The annotation of the Asparagine N-linked Glycosylation pathway in the Reactome Database. Glycobiology PMID: 21199820

4 Comments

  1. Nice slides! Using those DB is not really part of my day-by-day job, but it is good to know that potential issues exist, especially for those who only use them infrequently (and therefore are probably just assuming they are correct).

    1. Thank you Nico!! In fact I enjoyed the work I did, looking carefully at the annotations of a single pathway in different databases and trying to find any minor unclear point. I would recommend to any student wishing to do bioinformatics to do something like this.

  2. Hey Gio, thanks I enjoyed your presentation a lot! I really understand the annotation issue, we are on the other side though… in our lab we are annotating ourselves now. From scratch, public data, with own ontologies. It’s a nightmare but hopefully it will pay back… and in a way I enjoy what I’m learning 😀

    1. Thank you Alba!! I am planning to present this talk as a technical seminar, probably on April or later. See you soon!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">