Updates on docker and bioinformatics

My previous post on docker and bioinformatics received some good attention on Twitter. It’s nice because it means that this technology is getting the right attention in the bioinformatics community.

Here are a few resources and articles I’ve found thanks to the conversations in Twitter.

  • Performances of Docker on a HPC cluster – a nice article showing that running a NGS pipeline in a docker container costs about 4% of the performances. It’s up to you to decide whether this is a big or a small price to pay.
  • biodocker is a project by Hexabio aiming at providing many containers with bioinformatics application. For example, you can get a virtual machine with biopython or samtools installed in a few minutes. Update: this may have been merged with bioboxes (see discussion)
  • oswitch is a nice implementation of docker from the Queen Mary University of London, which allows to quickly switch between docker images. I like the examples in which they run a command from a virtual image and then return directly to another environment.
  • ngeasy, a Next Generation Sequencing pipeline implemented on Docker, by a group from the King’s College of London (I work in the same institute but I didn’t know them!).
  • a nice discussion on Biostar on how a reproducibility problem could be solved with Docker.
  • a Docker symposium planned for the end of 2015 here at King’s.
  • BioPython containers by Tiago Antao, including some ipython tutorials

Docker is another innovation for data analysis introduced in 2014. I am surprised by how many good things were released last year, including docker and the whole dplyr/tidyr bundle. Let’s see what 2015 will bring!