Reviewed “Bioinformatics with Python cookbook” by Tiago Antao

I’ve recently been a reviewer for the book “Bioinformatics with Python cookbook” by Tiago Antao, one of the big authors of BioPython. The book is published by Packt Publishing, and it is a collection of recipes for several bioinformatics tasks, from reading large genome files to doing population genetics and other tasks.

 

python book
Bioinformatics with Python Cookbook on my desktop, together with my zombie mug.

 

The github account of the author contains a link of all the python notebooks illustrated in the book. These notebook are freely accessible, but there is no explanation of the code, as for that you will need to buy the book. Moreover, the book provides a link to a docker image that can be used to install all the materials and software needed to execute the examples. I think this is a smart way to provide materials for exercises, and I will copy the idea in the future.

Being a reviewer, I was expected to be an expert in all the topics described in the book. However I must admit that I learned a lot from reviewing it, and that some of the recipes presented managed to surprise me. Here is a quick summary of the new things I learned:

  • How to convert many bioinformatics-related formats with pygenomics and biopython
  • How to use the rest APIs for querying ensembl
  • How to do and plot a PCA in python and eigensoft of SNP data
  • SimuPOP is a nice software for simulating population genetics events
  • DendroPy is a nice module for dealing with phyologenetic trees, like ete
  • PDB files are going to be replaced by mmCIF files, and BioPython is able to read both formats
  • pymol and cytoscape can be commanded from within a python script/ipython
  • PSIQUIC is a consistent interface to many molecular-interaction databases
  • ipython has excellent multi-core execution capacity.
  • it is easy to optimize python code with cython and numba, just by adding a few decorators

If you buy the book and find any error in the code, you can blame me as I was a reviewer and didn’t find it.

Origins of Evolutionary Innovations, chapter 5

The fifth chapter of prof A . Wagner’s “Origins of Evolutionary Innovations” tries to answer to the question: “Under which common principles do metabolic networks, regulatory circuits, and sequence folds evolve?“. It also formalizes a framework for a theory of innovations, to study how innovative phenotypes can be found by evolution.

Together, this chapter is a wonderful recapitulation of the previous four. Enjoy!


This is probably the last or the second last session for this book club. I will be away in the Leipzig course for the following two weeks, and then I will also be busy on May. I will maybe make another slideshow on chapter 6 in April, but I can not commit to it.

 

The genotype space – how does it looks like?

Today, in the metro, I have finally understood what is the form of a genotype space.

A genotype space is a representation of all the possible genotypes that can possibly exist, and in which two neighbor points are different only for one single mutation (Hamming distance is 1).

Until now, in the book club slides, I represented it as a matrix:

genotype space represented as a matrix

However, this representation has many flaws… it should be at least a multi-dimensional matrix, since each node should have exactly n neighbors (where n is the length of the genotype), while in a matrix they can have only 4 (or 8 if you count diagonals).

So, a better representation of the genotype space is a graph, like the following:

genotype space represented as a graph

In this graph, the “genotype” of an organism is a chromosome composed by only 5 bases, and in which each base can take only two values. Each node is connected only to the nodes that differ by a single position; for example, “00000” is connected to “10000”, “01000”, “00100”, “00010” and “00001”. Thanks to “jts” from Biostar, now I also know that this is an Hamming graph H(5, 2).

Now that we have a representation of the genotype space, we can take any phenotype of our interest, and mark it in the genotype space. For example, imagine that all the genotypes in green correspond to individuals that suffer a congenital disease:

a genotype network. All the green nodes correspond to genotypes that are affected by a congenital disease (for example)

The genotypes in green correspond to what A. Wagner calls “genotype network”, and other authors call “neutral network”. It is a set of genotypes that have the same phenotype, and that are connected by at least one change.

By exploring the topology and structure of a genotype network, we may be able to make some nice observations. For example, how big is the genotype network of a congenital disease, in human populations? Or, how can a population of individuals explore a genotype network?

There are really a lot of questions that come to my mind when looking at these representation. So, it is a good time for me to search on new literature!! 🙂

note: I wrote a small python script to generate a Hamming Graph of binary strings. Here it is: https://gist.github.com/1854319

Origins of Evolutionary Innovations, chapter 3

Here is the third chapter of “Origins of Evolutionary Innovations”! This chapter describes innovations in regulatory systems, and the evolution of networks of transcription factor sites.


The most important message of this chapter is that regulatory circuits can suffer a lot of changes, and yet remain functional. For example, some researchers have change up to 600 transcription factors in E.coli, yet it was still able to survive. Or, as another nice example, galactose metabolism is regulated by two completely different transcription factors in S.cerevisiae and C.albicans, yet these two species are not so distant philogenetically.

Another important message of this chapter is the structure of genotype networks of metabolic circuit. I think it can be well represented by this figure taken from [1]. It represents that, in order to find new phenotypes, a genotype network must be robust to changes (all the possible genotypes must be connected), but also be large, so it is able to explore the genotype space.

[1]Ciliberti, S., Martin, O., & Wagner, A. (2007). Innovation and robustness in complex regulatory gene networks Proceedings of the National Academy of Sciences, 104 (34), 13591-13596 DOI: 10.1073/pnas.0705396104

Origins of Evolutionary Innovations, chapter 2


In the second chapter, Wagner discusses the variability of metabolic networks. How do metabolic networks evolve? How many reactions can I remove or add to a metabolic network, without altering its phenotype? How much the phenotype of a metabolic network is robust to changes?

A possible source of confusion in this chapter is the definitions used. The “metabolic network” is the set of all the reactions that an organism can catalyze; while the “genotype network” is the concept defined in the previous chapter. So, this chapter explains how “genotypes networks of metabolic networks” evolve; be careful to not confuse the two terms. The following figure from [1] can clarify the definitions:

1. Matias Rodrigues, J., & Wagner, A. (2009). Evolutionary Plasticity and Innovations in Complex Metabolic Reaction Networks PLoS Computational Biology, 5 (12) DOI: 10.1371/journal.pcbi.1000613

book club on “Origins of Evolutionary Innovations” by A. Wagner

I am organizing a discussion club on the book “Origins of Evoutionary Innovations” by A. Wagner, for my group.


Well, I don’t promise anything, but since I will do the effort of producing some presentations anyway, I will also publish all the slides here in this blog.

This book describes how new phenotypes are discovered in evolution. In the first chapter, it starts by describing some examples of notable phenotypes that have appeared, such as the Urea cycle and the ability to use glucose as a carbon source. But in general, this book is about how any novel phenotype appears in evolution.

It also explains the concepts of genotype space and genotype network, and how much variability can a population of organisms withstand without having changes in a given phenotype. For example, there are far more possible mRNAs than the number of proteins observed, so it seems that any given protein can be produced by more than one mRNA. This means that an organism can withstand many changes to its DNA, without suffering changes to the structure of the protein. What is the role of this variability in evolution?

There is also a nice paper published on the topic today, in Science: Meyer JR et al, Repeatability and Contingency in the Evolution of a Key Innovation in Phage Lambda, Science 2012.

The book club will take place only in my lab, but if you are interested, you can follow the slides and comment on this blog. (or would it be better to discuss it on Twitter? Let’s use the #evol_innov_book tag on twitter). Enjoy!