In the last year I have been part of the team maintaining and updating the Network of Cancer Genes database, also known as NCG.
The main focus of NCG is to provide a curated list of genes associated to cancer, obtained after a manual review of the literature, and classified by cancer subtypes. Moreover NCG annotates some system-level properties of genes associated to cancer, from their protein interactions to their evolutionary age, and from the presence of paralogs in the human genome to their function.
NCG is a small database and is not supported by any big consortium, but we do our best to fill our niche :-). The following list will describe you what you can get from NCG and how can it be useful to you.
A manually annotated list of genes associated to cancer
It is difficult to keep pace with all the literature on cancer. New screenings on cancer samples are published every one or two months, usually describing novel mutations and new cancer driver genes. While these screenings add important knowledge on the mechanisms behind cancer, it is difficult to keep track of all of them, and have a clear picture of which mutations are driver in a given cancer type. The ICGC and the TCGA consortia provide some nice web interface to retrieve the genes recurrently mutated in a cancer type, but these are limited only to the data published by these two consortia. What about all the other studies published outside of ICGC and TGCA?
In NCG we manually review all the studies published recently, and annotate a list of genes reported as “drivers” in each study. So far we have about 70 papers annotated, and we are close to uploading a batch of about 70 more publications. The annotation process is currently done between three people, and each paper is checked more than once to make sure that the annotation is correct. It’s hard work, but then the output is a nice list of driver genes in each cancer type.
Annotation of paralogs of cancer genes
Recent estimates reported that about 80% of the human genes have at least one paralog (e.g. Dickerson and Robertson 2012). These percentages may be a bit too high, and they may be based on a excessively broad definition of paralogy, but overall we can expect that a good portion of the human genes have at least a domain or a portion of their sequence in common with other genes.
The presence of a paralog of a cancer gene is a factor to take into account, because it can complicate the development of drug strategies. In particular, it has been hypothesized that two paralogs can often exibit functional compensation, meaning that if we inhibit the activity of a gene, the other paralog can compensate the function, reducing any impact of the inhibition. This may render a drug less efficient in inhibiting an oncogene, or lead to unpredictable effects in other cases.
Annotation of gene age
It has been shown that cancer genes of different age can have different properties. For example most tumor suppressors tend to be old genes originated in the Universal Common Ancestor of all eukaryotes, while most oncogenes are originated in metazoans. The indication of gene age can therefore be useful to have an idea of whether a candidate gene may be an oncogene or a tumor suppressor.
The indication of age can also be useful to understand which model organisms can be used to study the gene – e.g. whether the gene is present in yeast, or only in closer species.
Role in the interactome network
Another important feature provided in NCG is the protein-protein interactions of the cancer gene. It has been reported that both oncogenes and tumor suppressor genes have on average an high number of interactions, so understanding which genes interact with a given candidate can be useful to understand the function and the involvement in cancer. The interaction network in NCG comes from the integration of 5 databases for protein-protein interactions (see An et al 2014 for more info), after some cleaning steps.
NCG is database annotating cancer genes and their systems-level properties. It is now at the 4th release, but its development is still active and we are looking for new properties to annotate. If you have any idea or suggestion, just contact us!
References (cross linked at researchblogging.org)
An, O., Pendino, V., D’Antonio, M., Ratti, E., Gentilini, M., & Ciccarelli, F. (2014). NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes Database, 2014 DOI: 10.1093/database/bau015
Dickerson, J., & Robertson, D. (2011). On the Origins of Mendelian Disease Genes in Man: The Impact of Gene Duplication Molecular Biology and Evolution, 29 (1), 61-69 DOI: 10.1093/molbev/msr111