Reviewed “Bioinformatics with Python cookbook” by Tiago Antao

I’ve recently been a reviewer for the book “Bioinformatics with Python cookbook” by Tiago Antao, one of the big authors of BioPython. The book is published by Packt Publishing, and it is a collection of recipes for several bioinformatics tasks, from reading large genome files to doing population genetics and other tasks.

 

python book
Bioinformatics with Python Cookbook on my desktop, together with my zombie mug.

 

The github account of the author contains a link of all the python notebooks illustrated in the book. These notebook are freely accessible, but there is no explanation of the code, as for that you will need to buy the book. Moreover, the book provides a link to a docker image that can be used to install all the materials and software needed to execute the examples. I think this is a smart way to provide materials for exercises, and I will copy the idea in the future.

Being a reviewer, I was expected to be an expert in all the topics described in the book. However I must admit that I learned a lot from reviewing it, and that some of the recipes presented managed to surprise me. Here is a quick summary of the new things I learned:

  • How to convert many bioinformatics-related formats with pygenomics and biopython
  • How to use the rest APIs for querying ensembl
  • How to do and plot a PCA in python and eigensoft of SNP data
  • SimuPOP is a nice software for simulating population genetics events
  • DendroPy is a nice module for dealing with phyologenetic trees, like ete
  • PDB files are going to be replaced by mmCIF files, and BioPython is able to read both formats
  • pymol and cytoscape can be commanded from within a python script/ipython
  • PSIQUIC is a consistent interface to many molecular-interaction databases
  • ipython has excellent multi-core execution capacity.
  • it is easy to optimize python code with cython and numba, just by adding a few decorators

If you buy the book and find any error in the code, you can blame me as I was a reviewer and didn’t find it.

my first PyPI package: vcf2networks

My first Python package is in PyPI!! I guess that now I can officially call myself a python programmer.

VCF2Networks is a python script to calculate genotype networks from population genetics data. Genotype networks are a method used in systems biology to study the “innovability” of a given phenotype, by representing all the genotypes associated with the phenotype as a graph, and studying some properties of this graph, such as the average path length and the average degree. For more info, you can look at the slides of the “Origins of Evolutionary Innovations” book club in this blog. The script in VCF2Networks allows to take any dataset of genotypes stored in the VCF format, and calculate many of these properties.

In principle, I am planning to submit an application note about the script to a bioinformatics-oriented journal. So, if you have some little time to lent me, and you want to test it, any feedback will be very useful for me. At the moment, the major issue is to simplify the installation, because this package depends on numpy and python-igraph, and these two modules require some terrible C libraries that must be installed separately. If you are aware of any way to distribute a binary package of a python module that depends on C libraries, your suggestion will be really welcome.