N-Glycosylation – one pathway, two distinct selective constraints

Our group just published a new paper in BMC Systems Biology. The title isΒ Distribution of events of positive selection and population differentiation in a metabolic pathway: the case of asparagine N-glycosylation. It is already on the journal’s web page.

The pathway of N-Glycosylation can be ideally splitted into two separate parts, one upstream and one downstream of a process known as Calnexin/Calreticulin Cycle, in which an intermediate product of the pathway is involved. In theory, given their function, we can hypothesize that the two parts of the pathway are exposed to different selective constraint, and evolve at different paces among human populations.

The biology and function of the two parts of the pathway are explained in details in the article, but I will try to summarize them here. The upstream part of this pathway is required for this Calnexin/Calreticulin Cycle, a mechanism of folding quality control, so we can expect that all of his genes are conserved among populations. On the other hand, the downstream part of the pathway is involved in host-pathogen interactions, and can be expected to be more variable when comparing populations that adapted to different environments. In the article we have shown that in fact, signatures of population differentiation are more abundant in the downstream part of the pathway.

Unfortunately I don’t have much time to prepare a good presentation to illustrate the paper, but I have uploaded a short resume to slideshare. Have a look at it if you are interested:

You can also check this previous post, where I explained briefly that the main theme of work done in our lab is to study how selective constraints are distributed along the genes of a pathway.

Introduction to Unix systems for Evolutionary Biologists – slides online

Here are the slides of the “Introduction to Unix-like systems” lecture I gave last Saturday at the “Programming for Evolutionary Biology” workshop in Leipzig.

In these slides, I did my best to communicate to the students what is philosophy behind the Unix systems and why they have been so important in the past. The Unix philosophy is in reality an approach to data analysis and programming: I am happy if I have been able to convince the students that, by studying how the first programmers have approached the problem of data analysis, they will be able to learn good programming practices, and avoid mistakes that have already been surpassed many years ago.

I would like to thank my colleague Brandon Invergo and my supervisor Hafid Laayouni for suggestions on how to improve the slides. Enjoy!

Continue reading

Planning a 8-hours “Introduction to Linux” course with trello

Next week I am going to give a 8 hours “Introduction to Linux” course at the “Programming for Evolutionary Biology” workshop in Leipzig. In this post, I will describe how I have used a nice planning software called “trello” to make the schedule of the course.

You must know that I am a big fan of using small card papers to organize things. I started using CRC cards from the ExtremeProgramming techniques, and now the way I organize my time is similar to the KanBan technique, although I kind of evolved it independently. In simpler words, I have the habit of cutting A4 papers into 8 smaller A6 papers, the size of a post-it, and use them to take note and to plan my projects. If you visit my office, it is full of collections of “A6” papers everywhere πŸ™‚

One day I may prepare a blog post about how I organize my projects with A6 papers. For now, just consider that trello basically allows me to do on a web page what I usually do on paper. Also, trello allows to share workflows with other people on Internet.. For example, I can show you the schedule of the Linux course that I have made:

my trello board for the "Introduction to Linux" course. Click to see it!

So, I used trello to make 5 distinct sets of cards, one for each of the 5 parts that compose the course. In each of this list, I filled some cards to describe the most important topics that I wanted to talk about in that part of the course. I have used some a red color label to highlight which is the most important message to transmit in each of the parts of the course, the “Take-Home” message.

Continue reading

Origins of Evolutionary Innovations, chapter 5

The fifth chapter of prof A . Wagner’s “Origins of Evolutionary Innovations” tries to answer to the question: “Under which common principles do metabolic networks, regulatory circuits, and sequence folds evolve?“. It also formalizes a framework for a theory of innovations, to study how innovative phenotypes can be found by evolution.

Together, this chapter is a wonderful recapitulation of the previous four. Enjoy!

This is probably the last or the second last session for this book club. I will be away in the Leipzig course for the following two weeks, and then I will also be busy on May. I will maybe make another slideshow on chapter 6 in April, but I can not commit to it.


The genotype space – how does it looks like?

Today, in the metro, I have finally understood what is the form of a genotype space.

A genotype space is a representation of all the possible genotypes that can possibly exist, and in which two neighbor points are different only for one single mutation (Hamming distance is 1).

Until now, in the book club slides, I represented it as a matrix:

genotype space represented as a matrix

However, this representation has many flaws… it should be at least a multi-dimensional matrix, since each node should have exactly n neighbors (where n is the length of the genotype), while in a matrix they can have only 4 (or 8 if you count diagonals).

So, a better representation of the genotype space is a graph, like the following:

genotype space represented as a graph

In this graph, the “genotype” of an organism is a chromosome composed by only 5 bases, and in which each base can take only two values. Each node is connected only to the nodes that differ by a single position; for example, “00000” is connected to “10000”, “01000”, “00100”, “00010” and “00001”. Thanks to “jts” from Biostar, now I also know that this is an Hamming graph H(5, 2).

Now that we have a representation of the genotype space, we can take any phenotype of our interest, and mark it in the genotype space. For example, imagine that all the genotypes in green correspond to individuals that suffer a congenital disease:

a genotype network. All the green nodes correspond to genotypes that are affected by a congenital disease (for example)

The genotypes in green correspond to what A. Wagner calls “genotype network”, and other authors call “neutral network”. It is a set of genotypes that have the same phenotype, and that are connected by at least one change.

By exploring the topology and structure of a genotype network, we may be able to make some nice observations. For example, how big is the genotype network of a congenital disease, in human populations? Or, how can a population of individuals explore a genotype network?

There are really a lot of questions that come to my mind when looking at these representation. So, it is a good time for me to search on new literature!! πŸ™‚

note: I wrote a small python script to generate a Hamming Graph of binary strings. Here it is: https://gist.github.com/1854319

“Programming for Evolutionary Biology” course – suggestions for the applicants that have not been accepted

The selection phase for the participants to the “Programming for Evolutionary Biology” course in Leipzig has finished. Congratulations to all the applicants accepted!

I am very sorry for the people who have not been accepted, but we have received a lot more applications than the places available, and the selection process has had to be very strict. As Katja Nowick, the organizer of the course, said, this is a sign of how much introductory courses to programming for researchers are needed. Hopefully we will be able to repeat the course or other people will organize similar courses in the future.

In case you have not been selected, I would like to give you a few suggestions on how to start to learn Unix/R/Perl skills.

– Are there any other courses for learn Programming oriented to biologists?

I think that the “Unix and Perl Primer for Biologists” course is a very good resource for researchers wishing to learn the basics of the Bash shell and Perl. Their material is easy to read, and explains everything step by step. The authors also wrote a book (which I didn’t read yet), and released some good material on this website.

Another good course that should not be missed is “Software Carpentry for Biologists“. This course covers a wider range of topics than the other, and, more important, dedicates a good effort on explaining what should be the “good practices” for a bioinformatician. Maybe the contents are a bit more advanced than the “Unix and Perl Primer”, although there are classes on the shell. In any case, once you feel a bit confident on your programming skills, you should definitely read all the materials on Software Carpentry, and make sure you have understood everything before starting a research project. There is also an “Advanced Software Carpentry for Bioinformaticians“, by Titus Brown, focused on Python programming.

– Are there other courses on Programming for Evolutionary Biologists, or on Next Generation Sequencing?

Thanks to reddit/bioinformatics, I have found two other courses similar to ours: a “Workshop on Molecular Evolution” from the University of Texas, and a “Computational molecular evolution” Course from EMBO in Greece. Both these courses seem very valid, although I don’t have any direct experience with them.

A good course on Next Generation Sequence is the Angus course, by Titus Brown. Titus Brown is a skilled bioinformatician and programmer, who developed, among other things, libraries such as Pygr and parts of nosetests. The website of the course is full of good documentation and examples, it should be a good place to start.

– where can I get help?

Internet is a good place where to ask for help on programming related questions. The StackOverflow network is the most active community for anything related to Programming and Unix in general. For next generation sequencing analysis, a good place is SeqAnswers. And, for the general bioinformatics question, biostar is of course a nice resource πŸ™‚

Origins of Evolutionary Innovations, chapter 3

Here is the third chapter of “Origins of Evolutionary Innovations”! This chapter describes innovations in regulatory systems, and the evolution of networks of transcription factor sites.

The most important message of this chapter is that regulatory circuits can suffer a lot of changes, and yet remain functional. For example, some researchers have change up to 600 transcription factors in E.coli, yet it was still able to survive. Or, as another nice example, galactose metabolism is regulated by two completely different transcription factors in S.cerevisiae and C.albicans, yet these two species are not so distant philogenetically.

Another important message of this chapter is the structure of genotype networks of metabolic circuit. I think it can be well represented by this figure taken from [1]. It represents that, in order to find new phenotypes, a genotype network must be robust to changes (all the possible genotypes must be connected), but also be large, so it is able to explore the genotype space.

[1]Ciliberti, S., Martin, O., & Wagner, A. (2007). Innovation and robustness in complex regulatory gene networks Proceedings of the National Academy of Sciences, 104 (34), 13591-13596 DOI: 10.1073/pnas.0705396104

Origins of Evolutionary Innovations, chapter 2

In the second chapter, Wagner discusses the variability of metabolic networks. How do metabolic networks evolve? How many reactions can I remove or add to a metabolic network, without altering its phenotype? How much the phenotype of a metabolic network is robust to changes?

A possible source of confusion in this chapter is the definitions used. The “metabolic network” is the set of all the reactions that an organism can catalyze; while the “genotype network” is the concept defined in the previous chapter. So, this chapter explains how “genotypes networks of metabolic networks” evolve; be careful to not confuse the two terms. The following figure from [1] can clarify the definitions:

1. Matias Rodrigues, J., & Wagner, A. (2009). Evolutionary Plasticity and Innovations in Complex Metabolic Reaction Networks PLoS Computational Biology, 5 (12) DOI: 10.1371/journal.pcbi.1000613