A tutorial on organizing and plotting data with R

Every year I teach in the Programming for Evolutionary Biology course held in Leipzig. It’s an intensive three weeks course, in which we take 25 evolutionary biologists with no prior knowledge of programming, we lock them in a room together with some very good teaching assistants, and we keep explaining them how to program until they learn or manage to escape.

Jokes aside, the course is a very nice experience, and people have a lot of fun, as the three weeks are full of discussion about evolutionary biology and about bioinformatics. The nice things is that this year two brilliant former students (Karl and Michiel) organized a whole reunion conference, which has been called the PEB conference and is taking place this week at the CIBIO near Porto. This reunion conference is also an opportunity to follow-up the students, and I have been in charge of organizing a couple of workshops, one about installing software on linux, and one about advanced R programming.

The tutorial for advanced R programming is available on github and below. I think it may be of interest for anybody with some knowledge of R, but wishing to learn some new tricks. In particular the tutorial is about good ways to organize a dataframe, and I tried to cover a few beginner mistakes about data analysis that I saw in Biostar. It describes the differences between a dataframe in a wide or a long format, how to convert one to the other, and what are the advantages of doing that. It also teaches how to calculate group-based summaries with dplyr, and how to plot them with ggplot2.

DIfference between a dataset in a wide or a long format
DIfference between a dataset in a wide or a long format

Continue reading